 And my name is Daniel Bohennan. I'm a consultant. I've been doing incident response consulting for about two years. Been doing incident response consulting with Mandiant for the past two years and actually recently switched over to a senior applied security research position. So I'm getting to do a lot of this stuff more which is really exciting. I'm also the author of the Invoke Obfuscation and Invoke Cradle Crafter PowerShell Obfuscation Frameworks. Thanks, Debo. Alright, so kind of to set the stage here, we're going to talk a little bit about a treatise on blue team follies. So as it stands today, you know, the majority of people use just regular command line logging. They look for processes, for example, PowerShell launching with malicious encoded commands. What we're going to do is kind of go down some of the depths of how that can be dangerous, especially when you've got some bad assumptions going on. What we'll do here though, of course, you can't get any of this intelligence unless you've got logging enabled. So for sure, make sure that you've got command line logging enabled. That's 4688. You can also get it from Sysmon. And then also that you've got the PowerShell script block logings enabled. Because PowerShell in version five has brought in tons and tons of awesome security stuff. And attackers aren't in the habit of enabling this stuff for you. So if you care about it, then you might as well do it first. So let's talk about what people do when they're actually trying to do a sort of detection based on malicious PowerShell. So you'll see here maybe they're going to look at whenever I see PowerShell dash command, I'm going to take a look at those encoded arguments and try to figure out and write some detections based on those. Well, PowerShell help much like everything else is the best in the business. Turns out that you don't only have to use direct input to the PowerShell dash command. You can also send in those commands through standard input, like many other automation tools you're used to. So here's an example of using this technique. If you do this, if I'm taking some PowerShell commands and I'm dumping them in through PowerShell standard input, if all you're looking at is the content in the PowerShell command lines, you're going to miss the stuff that went in there. However, we're a little bit okay. So this is what it looks like when you look at the command line logs for PowerShell itself. You don't see the actual command that was run. One little note here, this little blue shield, this is stuff that will still show up in your PowerShell logs. So while this won't show up in the 4688, this will show up in the PowerShell logs. However, that looks like it's kind of bad, but what you do see here is if you look at the parent process command line, the CMD itself, you'll see that somebody was trying to jack a bunch of input into PowerShell. And so maybe you might start thinking about doing some detections based on that. So here's a question. Do we start to maybe key any time that CMD is calling PowerShell? That might be useful. But here's the thing, CMD, you can also declare in an environment variable what process it should call. So here's an example of setting PowerShell as chunks of two environment variables and then having CMD call that. So you can do all kinds of obfuscation with environment variables. So just doing detection based on CMD that way, you're at some kind of risk there for sure. This is an example from Fin8, a financially motivated attacker doing a bunch of this stuff here with embedded in a VB script, VBA. Bring a bell? Yes, absolutely. That's what you just saw. They're using late bound command piping into PowerShell to get their stuff done to avoid this command line detection. But you don't only have to send things in through standard input. Here's an example at the bottom here of just using an environment variable directly and having PowerShell invoke the content of that environment variable. Covetor uses this as well. So if you're only looking at PowerShell command lines, you're kind of screwed. What about the clipboard? Here's a good example of just using the clipboard as an information passing mechanism. You do see these things in the command lines, but this is starting to get really, really, really tough. So you might say here, what happens if we just start applying detection for whenever CMD is calling PowerShell, maybe what I'll do is I'm going to start doing detections based on that process chain. That's the kind of thing that you see a lot of times in the HIDs and that kind of thing. But here's the thing. When I run this sort of launch technique and dumping a thing into PowerShell, what if in a parent CMD, I set an environment variable, I tell a child CMD to run the thing that was in the environment variable and that thing is the thing that actually calls PowerShell. So if you were doing anything that was just CMD calling PowerShell, yeah, you're going to miss that. So this would be kind of cool if that works. No, not quite. Nearly, but unfortunately we're not going to get there. However, with the magic of CMD escaping, all I need to do is escape that one last pipeline so that initial CMD stops interpreting it. There we go. We got CMD, that second CMD, you can't tell what it's doing and then finally somehow PowerShell is getting involved. So obviously what we're going to do here is recursively check all the way up this parent process tree, combine all those command line arguments, get rid of all that obfuscation and figure it out. That's the natural thing we're going to do and we're going to save the day and be good. Unfortunately not. There's a ton of ways to give input to PowerShell that do not involve any sort of parent process chain. What if you set, for example, you launch one window and set the commands in the window title, launch another window to scrape that window title as a kind of a sibling. You can use files, you can use reg keys, any sort of information passing mechanism that you can imagine that's not just environment variables that's going to blow through any sort of parent process tree based detection. So the good news, as we kind of mentioned a little bit back there was the script block logging and PowerShell will catch all these shenanigans. But if you're in an environment where the defenders are only looking at the command line logs, you're done. However, there's another thing. We've just been talking about, at this point we've just been talking about looking at process command chains. We haven't even talked a little bit about the PowerShell content itself and the ways through that. Yes, let's look at this content itself. So again, let's say you're really on top of things, you're looking at all process command lines. You're also taking into account PowerShell script lock logging. Are there any pitfalls that we might need to be aware of as defenders and are there any tools that you can use as a red team or to evade whatever evasion you're going up against? So we're going to take a quick example, a very rapid fire example, looking at what we call the remote download cradle. Now at the top here, this remote download cradle is basically, it's copy and pasted everywhere. It's in all the major frameworks that you can think of. Attackers love this stuff. It's a one liner that will remotely, it'll download a remote script in memory and then pass it to that invoke expression or IEX, which is basically like PowerShell's eval statement, if you will. So let's play a little red team, blue team here. We have our attacker command on the top, this remote download cradle. And as a defender, if we, what if we say, okay, I'm interested whenever I see invoke expression, new object, system.net.webclient and download string HTTP, would this catch this command? It will catch that command right here, but let's go through a little obfuscation exercise and see what pitfalls are out there. So first, whenever we see system.net, that's really not necessary in PowerShell. PowerShell will automatically prepend that system.net for the .net class, so if an attacker doesn't have to have it in their command, then as a defender, we definitely don't want to make that assumption in our trigger terms down here, so we're going to remove that from both. Next, the URL. This is a string, so you can do stuff like, I don't know, concatenate it in line. You're also not limited to double quotes. You can use single quotes. You can put white space there. You can also set it as a variable elsewhere. There's a lot of things you can do here, so we're just going to remove that HTTP part from the download string portion of the detection. So let's keep going. Download string. So download string is actually one of many methods in the net.web client class, and it's the most common one that we see attackers using, but it's definitely not the only one. This is just part of the list. Download string, download file, download data, returning in obviously in different formats, an expression, a file on disk, download data is a byte array. So maybe as a defender, we'd say, okay, let's just shorten it to .download and make sure that we capture all these options up here. So that's what we'll do. Also, this parenthesis isn't really necessary. We can start to chop up this PowerShell command and set pieces of it in variables. So, for example, some frameworks out there will say new object net.web client. Let me put that in a variable typically called WC for web client, and then you just have variable name .download string. So let's remove that parenthesis from the .download portion of the trigger. So now, from a PowerShell perspective, why might this . be problematic? How can an attacker get around that .download to evade this detection that we have? Well, download string from a PowerShell token perspective is actually a member. And so some things that we can do with member tokens in a PowerShell command is we can just throw single quotes around it. We can also throw double quotes around it. And if you look really closely at download string, I promise this next slide works. But what we can do is just add a tick mark. And that still runs. Now, why the tick mark? What is this? Well, the tick mark is the escape character or the grave accent character. I like to think of it as a grave of a lot of great defensive ideas I thought were really good. Ended up, they get broken by this. So we can actually put these in front of any character that has no escapeable meaning. And you see the eight characters here. Again, like tick zero is null, tick in for new line, that sort of thing. But as long as we place a tick before something that's not one of those characters, then we're good. If you're like me and you're really OCD and you really want to put a tick in front of those characters, all you have to do is just uppercase them and it totally works. So now we can put ticks in the character that we want in the method as long as we have double quotes. Now, here's the scary part. Is that those ticks are in the command line logs. If you have some real time agent, they're in the command line argument details itself and they actually persist all the way into PowerShell script block logs. Now, the place where this obfuscation doesn't have any effect is in PowerShell module logs, so the 4303 or the 4103 EID. So PowerShell's logging is really, really robust. Sometimes you just have to look in a lot of different places, but all the evidence is there. So as a defender, we can try to regex all this stuff to catch all these tick marks or maybe we should just give up on this? I don't know. If you're really brave and want to do a regex, make sure you keep in mind this open read method which will return as a byte stream instead of a byte array or an expression. However, I wouldn't recommend doing a regex because if you throw parenthesis around this, you can then treat it as a legit string and start to do concatenations, set it chunked in variables, reverse it, whatever you want. And this dot invoke that you'll see in these two examples, that's only required for PowerShell 2. It's not required for PowerShell 3 or later. So as a defender, if you look at invoke obfuscation, it uses that dot invoke to make it work on PowerShell 2 or later, but you actually don't have to have that. So make sure as a defender you're not basing your defenses on that dot invoke portion. So let's just remove that from our indicators. So net that web client really briefly. This is from a PowerShell token perspective. It's an argument to the new object commandlet. We can do double quote tick marks. We can put parenthesis and catnate it, chunk it in the variables, tons of options. We'll just go with the first one. There we go. New object. Now PowerShell is super inviting to newcomers to learn the language because there are so many aliases. So for example, if you want to list the files on a directory, you can use PowerShell's get child item. If you're lazy, you can just say GCI. Or if you come from a Windows background, you can type durr. If you come from a Linux background, you can type ls and it all works. So as a defender, we have to be really careful to make sure that we're understanding all the options that are available in PowerShell. Just from a pure syntax perspective, regardless of any kind of obfuscation. The nice thing is that new object has zero aliases. So initially I thought, hey, this is going to be a really solid indicator as a defender. However, PowerShell is really good at helping you find stuff that you know is out there but can't remember the name of. So for example, if I'm looking for a command that's new-p something, I can just type get command new-p wildcard and it will return in PowerShell objects all of the commands. So if I return just a single object, then I can actually pass it to invoke expression and that will automatically convert that commandlet name to a string and then invoke it. However, as an attacker, we could be a little more creative than this. Instead of that invoke expression, why don't we use a dot or an ampersand? Now one of these guys, these are invocation operators and when this happens, it's actually taking the object returned from get command and invoking it. However, we can get even more fun. So remember those wildcards? That is new object. As is that. And as many combinations as you can think of, as long as that get command is returning just new object, that sucker's going to run. And as a defender, you're not seeing new object anywhere in command lines or in script log logging. Pretty crazy, right? Actually doesn't stop there because get command has gcm. And it also has, if you promise not to tell anyone, there's actually an undocumented alias here for get command and that is command. Because PowerShell, again it's your best friend. It's really, really helpful. It doesn't want to make you look silly. So if you just type command, it's just going to check, hey, is there a get dash command? There is solid. That's what you were looking for and that works. So anytime you're running defenses based on get dash something, make sure you don't use it. In addition for get command, if you don't want to use wild cards, you can set the command name into variables like this. PowerShell 1.0 syntax. If you're a defender and you're not looking for this automatic variable of execution context, you absolutely want to be because it is really, really awesome. And if you're a red teamer, you definitely want to check this out. Here's just a couple ways you can basically call get command or some of its similar counterparts using this 1.0 stuff we just said. You can do the exact same thing with get alias gal or get alias is alias which is called alias. You can use that against the alias name instead of the commandlet name. So there's a lot going on there. Why don't we just choose this gcmw-o example right there. So it's getting a little crazy. In addition to all these things, we can throw tick marks in front of them because they're a commandlet and that's something that's available in the system. So we can use the string object and concatenate it or use what's called dash f format operator to literally reorder the substrings you just chopped up. Some people will say I'll just remove all special characters from event logs and that new object concatenated if I remove the quotes and remove the pluses, then new object will come back together. However that's not foolproof because these reordering techniques, you'll have to reorder them. So we can try to regex all the things or just give up. So I'm going to be a realist here and just go ahead and pass on this one. So we're left with invoke expression, which is a freaking awesome indicator, especially on the command line. IEX or invoke expression, you definitely want to be looking at this. What are some things we need to keep in mind with that? Well, it has an alias of IEX, which is typically what you see. The big marks because it's a command let. You can use the invocation operators and use concatenation reordering. And fun fact, in part of our research we'll talk about here in just a second is we assembled a massive PowerShell corpus just a lot of scripts. We'll get into the numbers in a second, but basically only 3% of scripts actually contained IEX or invoke expression. Pretty interesting. But one thing we have to keep in mind is that invoke expression has a script lock and typically it's used to run a command on a remote system, but if you never specify computer name for that remote system, it runs locally. So what does that mean from a defender's perspective? Well, with invoke command we have the alias of ICM, the dot and ampersand invocation operators also work, and then you have methods like dot invoke and invoke return as is and invoke with context, et cetera. A lot of options there, but in addition, PowerShell 1.0 syntax there's that execution context thing I was telling you about earlier, right? It has an invoke script method which can handle both expressions and script blocks. So let's add in tick marks to all these because they're commandlets, but how in the world as a defender can we start keying off of an ampersand or a dot? That seems like it's really going to be bound for false positives. So what if we say, okay, I'm only interested if there's a dot or ampersand and there's also curly braces because PowerShell, if only it were that simple. Because you can convert an expression to a script block. And here's two examples of that using the script block class and create method or again, execution context, PowerShell 1.0 syntax for the new script block method. And you can obfuscate all these just like we've been doing all along. So every single layer can be obfuscated to the extreme and it sticks. The obfuscation is there in the command line arguments and also in script block logs. In case just a couple months ago. Sorry, Lee. I have to deal with myself actually. So we're in the same both there. But it actually has over ten different invocation options. So there's a lot of cool stuff there. That is brutal. Can you imagine trying to defend against that? God. But anyways, now that you've done that, let's kill this brutal. So fortunately, that's really the extent of what you can do with PowerShell obfuscation. No, I'm totally kidding. There's way more. Why don't we just take after all of that. What if we then say, hey, totally screwed up FDUP PowerShell command, why don't we make you a string and then just reverse you on the command line and be reversed in memory? Here's some samples of that. We can also put garbage delimiters in the command and then split, basically split and join removing those. We can use replace methods to basically remove and replace those garbage delimiters. We can do any kind of concatenation that we want. And wouldn't it hold this by default? That'd be horrible. Horrible. So anyways, invoke obfuscation may or may not do that. So let's take this same download cradle we started with and instead of going through this, I don't know, 10-minute example of all these different things, we can literally just at a click of a button say, yeah, just randomly obfuscate all the tokens in there, produce something like this. You can then say, if you're really twisted, let me take this and then do some string obfuscation, like that I'm recently decoding this stuff because APT32, a nice Vietnamese APT group also known as Ocean Lotus, happens to like this combo quite a bit. They'll do one layer of string token all and then they'll do like literally five or six layers of this string stuff. So I've gotten a lot of practice, Lee. You make your bed, you lay in it, man. I'm telling you, man. I'm telling you. Invote cradle crafter. How am I obfuscation with this look different? Invote cradle crafter actually doesn't use any tick marks. It will use substitutions. It will basically say, okay, if we have download string instead of concatenating it or using tick marks, let me actually enumerate all the methods available to new object.net.web client and maybe like the 37th one actually resolves to the string download string. And so it's using all that kind of substitution there. So wouldn't it be terrible if there was actually new and worse obfuscation techniques that just hit the market like three days ago? Sorry. I've been sitting on this one for like six months, so I've got to get it out there. What if it's all special characters? Now, I have to say up front, kudos. This was not my original idea. A Japanese security researcher back in 2010 wrote Hello World using this technique entirely in special characters named Muro Guchi. So props to them. This is really, really freaking cool. But it's basically just a lot of different variables. Definitely really came up with this. But those variable names could also just be different amounts of white space. And then I was chatting with Casey Smith or sub T and he said, oh, well that looks kind of similar to like white space encoding. And I said, say what? And he said, yeah, you know, white space tab encoding. I was like, that sounds amazing. Let's do that right now. So that's the second one I released. So the entire command is either white spaces and tab delimited or at the end. So that's out there now. And this is pretty much what defenders feel like, right? And I am a defender. This is my job to come up against this stuff. But as you could tell, he's a noted blue teamer. So I feel really bad now. I feel kind of guilt tripped. Is there anything that we can do for defenders out there? I guess not. Hold on. We're just getting into this presentation. I think there's some stuff we can do. So you might think, hey, like, how in the world as a defender looking at your logs, are you ever going to find any of that stuff? Hands up, you're kind of screwed. So we decided to dabble a little bit. We're not data scientists, nothing like that. We just decided to play around a little bit. Here's a core point, though. You don't need to detect all the stuff in there. All you need to know is that it exists. All of us looking, we take a look at that and we realize that that's not normal stuff at all. What attackers are using as this amazing cloak of invisibility, we can do some smart stuff and turn that into like a shining laser. If you see stuff like this in your networks, you're screwed. You should take a look. You don't need to have the logging tools or the regex is telling you what it's doing. Just apply a bit of wetwear and you're going to be in good shape. Now how can we do that, though? That sounds simple. One of the cool things you could do is simple character frequencies. We were talking about the big PowerShell corpus we made. Here's an example on the right-hand side where we did some character frequency analysis against all of the scripts in posh code which is a popular PowerShell script sharing repository. It looks kind of like English. If you've ever done any simple crypto or anything, you kind of recognize those character frequencies. On the left-hand side, you see some of the obfuscated values of the scripts that we just showed. Very, very, very clearly different, right? You've got a bunch of back ticks and square brackets. This really, really stands out. So the question is like, okay, yeah, it's a list of numbers. How am I supposed to figure out how similar those list of numbers are? There is a tool out there. There's a whole community in the world called information retrieval. And they do things like search engines and analyze things like web pages and documents. And they'll figure out different features and different numbers. And then what they'll do is they'll compare those big lists of numbers together to find lists of numbers that are similar. So we're used to this from high school and stuff in graph paper. You've got two numbers that represent a line. Another two numbers that represent another line. And then the angle between those is the cosine. And then you can do some little math here on the right. So you can do some comparison on those lines to figure out how similar they are using the cosine. Turns out that the information retrieval guys like to do this for more than two numbers. So more than two dimensions, more than three dimensions, maybe like a thousand or 2,000 dimensions. And at that point you're talking about kind of like the angle between a 3,000 dimensional line. Like I'm having a hard time picturing it, but it's possible. It gives you a number. So here's an example of actually running that. Read all the PowerShell. It's just PowerShell. But what you can see here is we've got a huge grouping near the top. Most of these things have a very similar cosine similarity. But then you also have these obfuscated ones are sticking right out. 0.157. 0.379. This is an atomic bomb. Take a look at the average similarity among all of posh code. There's a massive grouping up here. If you take a look at everything below 0.8, and we did, these things are almost all obfuscated. And when they weren't intentionally obfuscated, there were things like a code golf competition where people just do garbage anyways. So if we could somehow automate this cosine similarity, like problem solved, run this on your logs, run this on your network, and like you're good to go. So these data points are generated again from all the scripts on posh code, for thousand, I believe. So we really wanted more data, and Microsoft's been thinking about this for a while, and so even looking last spring, they ran a little contest called Underhanded PowerShell where they invited the red team community to basically submit obfuscated and underhanded PowerShell commands to perform a very specific task that got around certain script analyzer detection rules that were in place. So that was kind of neat. And then on top of that data, there's a lot of PowerShell scripts in the community that we wanted to gather. And so we created a ginormous PowerShell corpus. But since we're both gentlemen, we did it politely. Now what do I mean by that? Well, this is a code that Lee wrote to actually scrape GitHub, for example. And if you'll see those little blue portions, that's the code that actually downloads the script. And all those red portions are blatant Canadian. Because Lee is Canadian. And it's very polite, throttling. So anyways, we politely scraped. I'm actually a fun fact here. You were scraping for quite a while on GitHub. Yeah, mad props at GitHub. So we took a look at all the repositories and figured it would take about, you know, a month. So there was like 11 million repositories to scrape through. Started going through a month straight, just downloading, downloading, throttling, downloading, throttling. Month later, it's like, look at my repository index. 12 million, 13 million. Doesn't make sense. Go off and look again. And I was off by order of magnitude. It was 100 million repositories. And I was like, we got something to do, man. We can't be rescheduling this to December. So we reached out to the GitHub guys. They went, zipped it all up, did a little bit of a back-end query, sent us a zip of all the PowerShell. So mad props to them. Yeah, big thanks to them. The really big thanks, though, is to all the contributors. So if you wouldn't mind raising your hand, if you've ever contributed a PowerShell script, a posh code, tech net, PowerShell gallery, GitHub, GitHub GIF, are there any contributors in the house? Awesome. Please give yourself a round of applause because you made this research possible. So when you assemble a very large corpus of PowerShell scripts, you're impelled and compelled to look at them. And it's very interesting when you start to look at all these scripts. I will never be the same man. So some of the stuff that we found was honestly just, like, really sad. Remove games.ps1. The author, oh, wow. It looks like it actually says Matt Graber. I don't know if that's right, but basically it goes through and kills any running game processes. And then, just hop it off, actually removes the directory. So I don't know where the high scores are kept there, but that's pretty cold, so kind of a buzz. Depraved man. But in all seriousness, no. A bit more serious. Actually came across one plot to overthrow some really interesting people in power. And that was this down with SOPA script. Let's fill the U.S. and Senate servers with the message that we don't want SOPA. Stop online piracy act. Resist. So a lot of fun stuff in the corpus. So getting a little more serious, there's a lot of scripts here. 408,000 PowerShell scripts, to be exact. 4,000 authors actually identified roughly 1,600 obfuscated scripts in this data set. Then we generated about 4,000 obfuscated scripts using things like invoke obfuscation, invoke cradle crafter, and ISC steroids. And the important thing here is that we actually manually looked at and manually labeled around 7,000 scripts. That sounds pretty tedious. People think that hacking is amazing in fast pace and it's not. It's a notepad window popping up. You're closing it and saying that wasn't obfuscated. Notepad window pops up. You close it. That was. And you get a big spreadsheet. So mad hacking skills. But here's the thing. Why do we do all that? Remember I showed this a little bit ago? The similarity and how kind of nailed it on this stuff below 0.8. Ryan Cobb did a pretty good analysis afterwards kind of doing, I think, a real scientific method of can we reproduce these results? Does it show the kind of obfuscated stuff they were talking about? And he did find that, hey, you do have kind of a balance between that magical number 0.8 and false positives, false negatives. So mathematically this is what it looks like when you run that algorithm against the PowerShell corpus and all of this labeled data. I showed you back there two obfuscated scripts. But what if you know for a fact that 1,000 PowerShell scripts are obfuscated or not? So then you can start to get some really, really good numbers. So the way that the information retrieval and data science community talks about this stuff, there's two main metrics here. One is precision and one is recall. So precision says any time the tool says that something is obfuscated, how often is it correct? So in this situation that was 89%. So all of those things below 0.8, you're almost 90% effective at getting obfuscated PowerShell. Recall though, that's where it takes a dive. What we didn't analyze is all the stuff above 0.8. Turns out that was hiding a lot of stuff that later you would consider obfuscated and that's reflected in kind of a merge score that people talk about as the F1 score just kind of averages the precision and recall. So some super powerful stuff on obfuscated but it's really got some opportunity here. This is PowerShell though, the most powerful scripting language known to man. We have the power we can do better. So instead of just looking at the character frequency of an entire script PowerShell, if you think about any language it has to basically interpret and see what am I actually looking at. So this is an example here of PowerShell tokenizing the command and so basically you can see Git command is a generic token. The dash name is a parameter token. We have the format operator strings, et cetera. However, what we have at our fingertips is not just the tokenization that PowerShell can provide but it's also a tree. And this is called the abstract syntax tree or AST. And not only does this identify all the tokens it actually identifies the relationship with those tokens in the entire script. So now you can see okay there's a format operator but what's on its left, what's on its right what kind of how many objects are there and so this allows us to get a lot of interesting features. Now if you want to easily view this there's this awesome GUI, this PowerShell AST explorer which you can actually get, it's hosted on the PowerShell gallery so you can literally just type in install module show PS AST and start running this right away and it's a really really nice GUI interface to explore AST. Type in any command on the right, check out the AST. Now why don't we do this? Well with AST we can get extremely granular. So for example we can say let me just group the number of all the different AST types. So maybe this one script is 33% strings and that's all it is. Or maybe 99% of the script is a massive array. We do things like looking at array size ranges. Again if we have a 5,000 element array maybe that's shell coding there. I don't know that's pretty large. Also we can do things like look at language operators, those looking at assignment operators, binary, unary operators, invocation operators and then for every single component we added an additional layer of feature extraction to basically say okay for all these types let's then do character frequency analysis just on these types. So for example all commandlets in one group, all strings, all methods, all members and we'll do things like looking at character frequency, entropy, white space density, the length, the maximum, minimum, medium, mode, range and average length on top of that. And then also the percentage of character casing, look for randomization in character casing. So this actually produces quite a few features. 5,000 that's going to raw. What do we do with 5,000 features? That is a bad realization. You got 5,000 features, you know you've got stuff in there that's awesome. What do you do with it? So there's a common method out there to kind of classify data. It's called a linear regression. That's kind of the example there in the back and the left hand side. So what you do is you kind of have a big math equation and if the number is even boundary then that's obfuscated and if it's below then it's not. Now the thing is there's AST features that we were talking about. They really let you figure out a bunch of the techniques about obfuscation itself and not just simple metrics. But they kind of end up being rangy so some of them end up being really large and really small and so what you do is put them through a logic function and that basically scales them between zero and one. So all those features that we extract from them, you combine that and that's what is called a logistic regression. That's a really, really common thing. So for example Excel lets you do a lot of these things as well. So here's what it looks like. You have all of the features so f1, f2, f3 and every of them have a weight. So I'm going to add up a feature times what the weight is going to be and another feature times what that weight is going to be. Big, big, big, 5,000 of these features. So here's the big issue. I kind of dodged the question. What do I do about 5,000 features? Chris back there in the car saw 5,000 features. He didn't know what to do. This is what you got to do. This is called a gradient descent and the idea being that you don't necessarily ask Debo like hey, Debo, how important do you think a square bracket is? That's a 0.2. I know a 0.2 when I see one. But what you don't know there is what's that going to do to your false positives and everything else. So what you'll do with the gradient descent, remember we had all that labeled data. So what we can start with is 5,000 weights and then just run a simulation and see how good they are with classifying. And if you run a simulation you're back into all those weights. So if you had a large weight it's going to get adjusted very appropriately more significantly than small weights. And if you don't make a mistake then those weights don't get adjusted. And over thousands and thousands of times this is just automated statistics. Over thousands and thousands of times you end up optimizing yourself towards an algorithm that's really really good based on what you measure. So you'll kind of like a pit of despair when you saw all that stuff all that obfuscation and everything. Now when you start taking this logistic regression and the gradient descent, you know you might be happy if that F1 score, remember we were down at like 57% wouldn't it be nice if we could get to like 60, 70% how about 95% using revoke obfuscation you can detect 94% of in the wild obfuscated power shell with only 1% false positives. That is literally 10 times better than cosine similarity with half the false positives. And we can even do better. What about like that was in the wild obfuscated power shell and those kind of tend to be like everything to the wall obfuscate all the things. But what if you start talking about some stuff that just feels kind of sketchy, right? Where it's like you know on that left you see that there's massive, massive obfuscation. Yeah, absolutely, like yeah that's an easy target. But this example on the right that's not really obfuscated. It's just kind of using unusual power shell. It's this big select object with a bunch of hash tables. That's a little bit out there. What could you do if you started applying revoke obfuscation to just mildly sketchy stuff? You know this is an example of suspicious. So you want to kind of widen the net a little bit. On sketchy still it's really, really great. It's able to return 89% of all sketchy stuff. So you're 88% of all sketchy stuff with very, very low false positives. But what's even cooler is that bucket now when you start talking about in the wild obfuscated power shell it goes from 97% recall. So revoke obfuscation all by itself 97% or 94% of in the wild power shell obfuscated power shell bringing it now to 97% of in the wild obfuscated power shell. Super crazy. This is a nuclear bomb against anybody trying to use obfuscated power shell in an engagement and as a blue teamer or a defender this thing has got to be in your arsenal. So logistic regression and gradient descent is the only thing out there. There's other cats in the game. There's boosted decision trees. There's support vector models and all that kind of stuff. So one of the cool things that we got out of doing all this. So this was a hand implemented algorithm in C sharp based on some cool stuff shared by James McCaffrey on MSDN for example. We compared it. We just had a big CSV of all this stuff and you can actually do it like this. So revoke obfuscation had very, very basically equivalent functionality to this same version implemented in a commercial machine learning system. But then you can apply it to others and the next closest one was the boosted decision trees and those things had about the same accuracy and then two of the other algorithms that we messed around with had much less. So what we've baked into revoke obfuscation is a really top end model for you. Did anyone like to see a demo? Yeah. So I like to think that revoke obfuscation is a really clean, pure, command lit approach. A lot of the tools I write have ASCII art like this. Console gooies for the win. So actually what we'll do here is the first one is me getting out all of my ASCII art in a completely separate way. Just to show some of the level of stuff we're doing here. Again, 5,000 features on the average less than 300 milliseconds for extracting the features and measuring it. And here's just a menu. This is completely for fun and for LOLs. There's stuff like a tutorial. If you like a colored version of the readme, basically what it is. There's a lot of fun facts. Again, a lot of interesting stuff you can look at fun facts and see some of that stuff there. You see a lot of interesting ASCII art when you're going through all these scripts. So randomly see some ASCII art and show you the project it came from. Got some set of fun quotes and credits. Again, if you've ever contributed any PowerShell to GitHub or other sources like your name is actually in this code and if you run that enough, you will see it. So on to the stuff that actually does stuff. Most people don't have a huge PowerShell script. They've run their environment to analyze here. In revoke obfuscation, we'll handle both command lines and scripts. And so we're trying to make this as operationally friendly and easy to implement as possible. So let's say that you just want to query your event logs using getWinEvent, simsweep, maybe you just want to collect the raw EVTX files. That's totally fine because we wrote a function called getRVO script block. RVO standing for revoke obfuscation. Extract all of the script blocks and actually reassemble the script blocks that fall across multiple script block entries there. And so basically what's really nice is that again, if you want to start with event logs, what you can do is say, alright, let me get all these event logs. Let me pipe it in to get RVO script block. Let me retrieve all of the scripts from that and then we can pipe it in to measure RVO obfuscation. And there it is, churning through them. Thanks, RVO. And as you can see, it caught our fun example of all special characters there with a nice obfuscated as true. And as you can see, all the script features are there, everything, the amount of time it took to extract the features, the measurement, and all that stuff. So the very last thing I'll say here is that again, our desire is not just for this to be something that's used in research, but really to make it accessible to any organization. We want you to be able to take this and literally run it within minutes. And so to help facilitate that, we actually have this hosted on this GitHub right here, but it's also hosted in the PowerShell gallery, which literally means you fire up PowerShell and run install module, revoke obfuscation, and it's there. It's locked, loaded, ready to go. And the one last thing I'll say is that again, to make it operationally friendly, we want this to be as accessible and easy to use in an operational sense as possible for any defender out there. And that is our talk. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.