 Hey, welcome back to AppSec Village. Hope you're having a great time here at DeathCon 2020. Get ready for another wonderful talk. We've got Jared Overson and he's going to talk about Hackium, a browser for web hackers. Okay, let's be honest, with a name like Hackium, you've got to kind of check it out. It's like the coolest name ever. I know I'm going to be watching this one. Jared is the director of engineering at Shape Security, now F5, where he designed and led the development of Shape's enterprise web security platform. Jared is a frequent speaker on modern web threats and has been quoted by Forbes, The Wall Street Journal, and CNET, along with many others. He co-authored O'Reilly's developing web components book, delivered dozens of analysis and reverse engineering tools, and frequently writes about web development and fraud. Please welcome Jared to the AppSec security stage. Hey, hi, hi, thank you. This is, thank you for being here. Thank you. This is awesome. This is, this is so cool. This is, this is strange. This is strange. This is, I can't lie, this is strange. I can't see you. You're not actually here with me. And I was going to have slides and then it started feeling like a webinar with my little picture in the corner. And a webinar is the last thing I think of when I think of Defcon. And I just, I couldn't do it. I just physically could not do it. So we're just going to, we're going to do this live, except prerecorded with a whole bunch of cuts and whatever. But it is, it is what it is. This is, this is a strange time. But anyway, I'm excited to talk to you about what I'm going to talk about. But first, who am I? I'm, I'm Jared Overson. I talk a lot about credential stuffing and automated attacks. And I'm just, I'm a web hacker. I've been hacking the web for the million years. I work at Shape Security, now part of F5, Director of Engineering, where I built Shape's Enterprise Web Application Security Platform. And it was, it was an awesome opportunity because it gave me the chance to basically make a product that would defeat me. And that was fun. I felt like, well, I could do this, but then I could beat me by doing this. And then if I did this, then I could do that. And if I did that, I could do this. And it was, it was just a cool, fun game against myself. I ended up building something pretty cool. But I'm no longer building that anymore. And I'm building other things, which is what I'm going to talk to you today about. But enough about me. You, let's talk about you. I mean, everybody knows who you are. It's, it's really awesome to have you. Thank you very much. So let's get started. So I am, I am really excited to, to introduce three major tools, but really a whole suite of tools that I guess I've been working on for my entire life. Every six to 12 months, I seem to rebuild part of these tools in order to do something with the internet and the web that I want to do, but can't given the way the web works. So over the course of the past 20 plus years, I've been crafting and honing my strategy for manipulating web stuff. And over the course of the past year, year and a half, I've started to, to, to, to focus those into a few projects to see how good I could make it. And I'm really happy with what I've got. So today, let me be talking about three major projects. Hakiom is the one that's going to be most interesting to most people because it feels like a browser, but shift refactor and shift interpreter are much more technically complex and give you the ability to, to, to mold and twist JavaScript like you're a magician. And JavaScript I know is not always the most well respected language in security circles, but you can't get away from it on the web. JavaScript is what you got and you have to deal with it. And when more and more business logic is showing up in websites, you have to understand more of it because what is transmitted from a webpage to a server is becoming less and less understandable on its own. So all these tools together give you the ability to manipulate web stuff, understand web stuff and transform web, web stuff so that you can just control it better, which is awesome. All right. So this session I'm going to be going over three or four major topics. I'm going to be talking about what Hakiom is going to be going over the REPL and the initialization command and configuration so you can get started. And then I'm going to be going over how to use Hakiom with external services like toCAPTCHA to solve CAPTCHAs and also how Hakiom generates human behavior to make it bypass trivial defenses more easily. And finally going to be going over how to use Hakiom with shift refactor and shift interpreter to automatically, programmatically de-obfuscate JavaScript. Hakiom is a command line tool. It can act as a browser. It's an automation framework. It is a foundation for building web hacking stuff. So Hakiom is a Node.js library. It is installable via NPM. I can install it via NPM install dash G Hakiom, which will install the Hakiom library and the command line tool globally. Install Hakiom. NPM install dash G Hakiom. And PNM will do all of the stuff that it needs to do. Downloading Chromium R782078. So Chromium is downloaded via Puppeteer right now and it's a bundled Chromium that is guaranteed to work with that particular version of Puppeteer. Hakiom started as a bunch of Puppeteer scripts that grew and were extended and copied over and over again over the course of a long period of time. And for the time being, Hakiom extends a lot of Puppeteer's functionality and reuses the bundled version of Chromium and it will continue to do this for as long as possible until the code bases of each divergent away where code reuse is no longer possible. All right, so now we have Hakiom installed locally. Version 1.0.4. We start that up. We get what looks like a normal Chrome browser. Since we have Hakiom running, we can use it just like we would any browser. Comes with one extension pre-installed as the Puppeteer extension bridge. One of the dependent projects for Hakiom. This allows Hakiom to communicate to and from the Chrome extension API. So Hakiom also exposes all the Chrome extension API to Node.js scripts and automation scripts so that you have access to a lot of additional functionality that is normally inaccessible from command line scripts. So Hakiom also includes an in-page client for plugins and functionality to hook onto. So there's a consistent way to share and communicate between the extension and the automation script behind Hakiom. So we can use the event bridge to communicate with the extension. We can post messages back to our automation script and if we had anything else on here, we'd be able to access them. So Hakiom also exposes a REPL to the command line. See, we have access to page, which is our active page. Page.go to hpsexample.com and example.com. Let's go back to Google. So you can use the REPL to do everything that you would normally do within a Puppeteer or Hakiom script. Here we are typing into the input that has the attribute aria-label equals search. We're typing Hakiom with a new line at the end to basically just search for the term Hakiom in Google's website. So this is a good way to experiment with automation and you can also get out of here and use the REPL history to get a recording of everything that you've typed. So you can just copy and paste these directly into a Hakiom script and share these with other people. So the shareability was a very, very high priority for Hakiom because historically it's been pretty difficult for people to share their work manipulating websites. Like if I were to make something neat that manipulate at google.com or change the functionality of Spotify or Twitter or whatever else, it would be fairly difficult for me to just quickly share that with somebody else. There are two main ways that people have historically manipulated websites and that's either via the embedded browser dev tools or with a proxy or both. So a proxy would give you access to to intercept and manipulate and change any transactions for the web, which is very, very powerful. But if you've worked with proxies extensively in the past, you know that they can be somewhat of a pain to install, especially if you're talking about different platforms, Windows, Mac, Linux or whatever. And if you do go this route, then chances are you have a favorite proxy. And if you try to share it with somebody else, they've got a different favorite proxy or a different configuration for the same proxy, or something that makes it difficult to just quickly share your work with somebody else. Now, if we do a lot of our work within the embedded browser developer tools, which are extremely powerful and fantastic, definitely, we have no way really whatsoever to share that with anybody else. A lot of the work is manual. It's difficult to automate, if it's possible to automate at all. There are like content snippets and other little things that you can use to repeat repetitive work, but those aren't easily shareable. They're not easy to really configure in the first place for yourself. And all in all, it's just, it's kind of a pain. When you do this work, you get all set up yourself and it becomes difficult for anyone to use your stuff. So with Hackium, it was a priority for me to get everything as self-contained and bundled into one project as possible, so that if you wanted to, you could share just a small configuration implementation to anybody, and they can see what you've done. One of the important aspects of shareability is making sure that the user, this is destined for, doesn't have to know anything secret or magic about your configuration to get up and running. So we take care of that with Hackium by making a lot of the base configuration stored in a configuration file, that is just JSON or JavaScript, and it's portable so that you can deliver it to anyone else. You can initialize one easily just by using Hackium in it. You can set up the URL you want to go to by default. You can configure whether or not you want DevTools to open automatically, which might be important for your script. If you're communicating with DevTools, we are not going to do that here. We can create a blank JavaScript injection. One of the other top priorities for Hackium was to make sure that it was as trivial as possible to inject JavaScript before anything else is loaded, which is the only way that you can guarantee a pristine stable environment for your code, and to also make it trivial to intercept and manipulate content coming in so that you didn't need a proxy at all. So Hackium makes that possible with just generic injection files and interceptor modules. We're going to say no to that. We will create a boilerplate interceptor and, sure, a boilerplate Hackium script. We do not want to run headless. So now we have our hackium.config.js. It's a simple JavaScript file with the configurations that we set up. We can see that we have a boilerplate interceptor. An interceptor is just a standard Node.js module that exports two properties. One is the intercept property, which is an array of request patterns that this interceptor will be configured to intercept. So here by default, you can see that we have just a wildcard asterisk for the URL pattern, and we are matching script resource types, and we are doing it at the response stage. And the second thing that an interceptor needs to export is the interceptor property, which takes a synchronous or asynchronous function that can just do something with a request. So to share this work, we can just archive this directory. We get uploaded to GitHub. We can publish it to whatever publishes service you want, and anyone can download that and run hackium with just the command hackium and see what you did. So now that we've gone over the REPL and the getting started, we can jump into what makes hackium different than other browsers and how to use its APIs to connect to external services. So if you follow me at all, you know that I talk a lot about automated attacks, things like credential stuffing, scraping, and the trajectory of tools looking more and more human. Alice, since this is a browser that is controlled partially by scripts or extensions or power tools that do stuff, it's important for the behavior that is being generated to look plausibly human so that it doesn't interrupt your browsing and your usage. So one of the extensions onto a puppeteer that hackium provides is automatically generated simulated human behavior. Now puppeteer and a lot of other automation tools have the ability to slow down behavior so that it doesn't come all puking out on the page all at once. You don't type it large, large blocks of text within a fraction of a second all at once. You can slow it down. But still, that's not quite human enough. Humans don't have the consistent durations between key presses or mouse movements that all take a certain number of steps. So one of the things I did with hackium was to make sure that things like keyboard events happen at different intervals and durations all within a minimum and maximum expected human behavior. And things like mouse movements follow non-direct paths and sometimes make mistakes like this. So this is a slow meandering move. You can see down over here you kind of have to curve up to get to the point we're looking at. Refresh again, random, larger curve, much slower this time. I'll be also overshooting at the backtrack. Do this again, overshooting backtrack. Now things that where if you're looking at it, it would not definitely be automated behavior. This is important because just naive automation detection techniques will check to see whether or not you're clicking consistently at like a zero, zero position on every element. Or if you're zipping from one point to another with no mouse move events in between. Or if you're moving all on a straight line or at the same velocity, things like that. So we have to account for that so that we can generate behavior that is not automatically blocked by everything that's out there. Now another thing that you'll probably know if you follow me is that I hate captures. Captures are just such a pain in the ass. Most of them are really, really, really bad. And things like recapture, Google's recapture have seem like they've gotten much, much worse over time. You know, like things like the little grid of pictures where you have to determine what a fire hydrant is or what a bus is crosswalk, things like that. They're a pain. And captures are things that will pop up in the way of automation all the time. And it's an arms race back and forth trying to make an automated tool that bypasses captures, then capture makers get that tool, figure out how to block it, blah, blah, blah. Now, rather than build in any capture solving within Hacum, it's much easier to delegate that responsibility to services and companies whose core purpose is solving and bypassing captures. So you wire up services like to capture very easily within Hacum scripts, just by using standard node libraries and node modules to make requests and parse JSON responses. So this is a script that I put together. I am sucking in my API key here just so you don't see it. And next, we're going to go to HTTPS old.reddit.com slash login. We are going to initiate a capture request with our API key. We get a request ID. And then we type in our username into the user reg input, Samuel Clemens 90210. We type in our password twice on the password inputs, ctech astronomy is our password. And then we pull for request results. So the way these capture solvers work is that you send a request, it's like, hey, solve this capture. And then you pull for a response for when that capture is done. So you're just not sitting there waiting, you can do additional work while that capture is being solved. So after we've solved that, we input the capture response to an element on the page g dash recapture dash response. And then we submit. So now we can run this script, hack him dash index.js, it'll pop open hack him, go to reddit.com sign up page. Notice how the the keyboard typing is not instantaneous, but it's not consistently slow. It's kind of a bleep, bleep, bleep, bleep, bleep. That's the technical term for that. So now that we're done, we're going to pop over to the console, I see that we're waiting and initial time out for to capture. And then we are done waiting. So now we're pulling for our response. And now that we're done, pop over to the webpage and bam, Samuel Clemens 90210, we are on Reddit, we are able to, I don't know, prop up whatever political candidate we are interested in this election cycle. And this is just one technique for getting by all the different defenses out there. And I want to make this clear, I do currently work at shape and f5 hack him is not a tool designed to bypass security defenses, security defenses for things like credential stuffing or spamming or scraping, those operate at high volumes. And the type of work that they a legitimate and curious hacker using hack him would do is much, much smaller volume. So that you can bypass things at small volume, but you'd be caught at high volume cover my ass. All right, so so far, I've talked a bit about just basic usage of hack him with the the repel and other stuff. I talked about the extensions it has, or to make it look more human driven, as well as show an example of how to tie it to other services. And I haven't even touched at all on the other parts of the suite of tools, shift refactor and shift interpreter. Nor have I talked really much about the whole point I made at the start, which is about manipulating website content, which leads me into a hack him interceptors. So interceptors like we've already gone over, are just basic node modules that intercept a particular URL or request pattern and just do something with it. You can use the hack him a knit command to initialize some basic interceptors. I'll call ours interceptor.js. We can use a basic interceptor tab and a basic interceptor template, which just doesn't really do all that much just gives you the boilerplate, a pretty printer which uses prettier to format JavaScript. So we're going to start a basic transformer using shift refactor. So shift refactor is a library I created that makes it easy to query and transform nodes of a JavaScript AST and AST and abstract syntax tree is just a data structure that represents something that was parsed. So the AST I'm talking about is just a big old JavaScript object that represents JavaScript source code. You can use a tool like AST Explorer to just paste JavaScript in and see what AST generates. So you can pop through each of the nodes and see what types they are, see how they're structured. And if you change any of those and regenerate source from that AST then you change the JavaScript source. So shift refactor takes JavaScript source parses it into an AST and then analyzes the scope tree creates parent mapping basically does a lot of legwork that makes it easy to do queries and translation on it. It also leverages shift query which allows you to query a JavaScript AST with CSS like selectors. And I modeled shift refactor based off of the jQuery API. So it actually feels very, very similar to navigate and traverse and replace large trees. So we have a script object based around our parsed response body. This this expression replaces all console log expressions with alert calls. So we make a query for call expressions that have an object name of console and a property of log. And we replace those with call expressions for alert. And we translate the arguments from the console to the arguments for the alert. As I'm breezing through this, I know that I am glossing over some fairly complex topics. If you haven't played with JavaScript ASTs before. But I can't stress this enough. Despite all the crazy terms and syntax like call expression and call e and identifier expressions and some more that you'll see. Those are just nodes of a JavaScript tree. And all you're dealing with is a gigantic JavaScript tree in a similar way that you would navigate or traverse the DOM. So it might look foreign at the start. But don't let that dissuade you from jumping into this because it's not difficult to get a hold on it once you get the practicing. So here we've got a basic web page that exists only to simulate a real world scenario. Because I'm going to be jumping into programmatically de obfuscating JavaScript. And there are plenty of real world examples out there. And you probably have seen a few of those if you've been looking through a web page source. But we run into some sketchy legal ground if I'm de obfuscating that live in front of people. So we've got a nice little sample website here that I've crafted to look like a payment form that has a conveniently a obfuscated bit of JavaScript just floating around on it. This is a sample of obfuscated JavaScript that was obfuscated via obfuscator.io, which is a pretty common JavaScript obfuscation website because it's the first URL that pops up when you search for JavaScript obfuscator. It actually does a pretty good job at obfuscating JavaScript. It can inject dead code. It can shuffle things. But it is extremely difficult to protect JavaScript. And obfuscation really doesn't actually put all that much of a barrier in front of an adversary trying to understand the JavaScript. So we're going to go through and show how this can be completely reversed and sent to the browser before the browser even gets it. So it looks as though you are browsing the web with de obfuscated JavaScript. So this is using Firefox's dev tools. Even if we pretty print this, it's not all that much more readable. You see here that we start off with a list of encoded strings, clearly encoded strings. We are followed by a variable declaration that, I don't know, does stuff. It has an alphabet here. And given that these are encoded, this is probably the decoder. We go down here. And we see references to that function all over the place. So 0x227, 0x227a, 227a, 227a, 227a, 227a. This is a common method of making it more difficult for somebody to immediately understand what is going on with JavaScript. So I created a complete interceptor here, so I don't have to code in front of all of you, because that's not very exciting, is it? So we've set our URL pattern to the vendor file that I had on my local server. And we have fleshed out the interceptor so that we are running it through refactor and also loading up our interpreter via shift interpreter and massaging some JavaScript. So refactor, we already went over a little bit. We are sending it the response body. And we're also adding a common methods plugin. Next, we are creating our interpreter instance. And we're loading the scripts AST. And we are passing it a context that we are requiring in from here. So the context is like the set of global variables that are accessible by JavaScript. And by default, this interpreter doesn't have anything in its global context. So if we want to expose things that the interpreter needs, then we're going to have to explicitly set them. And we can do it ourselves. It's just a plain old JavaScript object. I've put together a fairly basic and simple DOM look-alike context that uses JS DOM in order to simulate a browser DOM. So if you're interpreting browser side JavaScript, it'll have access to things like image elements, A to B, B to A, URL, URI, decodes, something, whatever, all those things. One of the interesting things about this interpreter is that it was designed so that it can take statements or expressions piecemeal. So you don't have to execute a script statement by statement expression by expression in the flow that a script would expect to be executed in. You can take any statement or expression out of the AST and pass it to the interpreter and interpret it as if that was the next expression or statement. This is very, very handy when you're dealing with JavaScript like the way I'm dealing with it because sometimes we don't want to execute all the JavaScript. And we want to just execute only the pieces that we want to reuse so that we don't have to write our own decoder functions or copy and paste anything. We can just grab it, interpret bits and pieces of it, and then use that to de-obfuscate the rest of the script. So it's like using JavaScript against itself. So here we're going to run the first statement in the script, which is this. So this basically primes this variable with these values so that anything else that we execute can access that variable. Next, we're going to be getting the decoder, which is the second statement there. Rather than run it directly, we're going to assign it to a variable so we can play with it later. And we're going to run that decoder statement, which is a variable declaration statement that assigns a function expression to a variable. Next up, we're going to query the innards of that decoder statement in order to get the first binding identifier. Now, a binding identifier is an identifier, like a variable name or something like that, that is basically being assigned to or having a value bound to it. And what that means here is basically we want to get that because we want to know how this function is referenced. So we're looking for this binding identifier. See, it's not that complicated once you start poking through and playing around with things. It sounds weird and scary to some people, but it's not that bad. Next up, we want to get all the references to that name. So that binding identifier is called throughout the script. We want to get all the references, the places where it's called, so that we can do stuff with it. And we're going to map those references and grab the node of that reference. So a reference is an object that has a node property in it, which is an AST node, an accessibility property, which tells you whether or not the reference is being read to, written to, or is read write. So then with those references, they're all going to be like an identifier expressions. So let's see down here. So an identifier expression is just this bit of code right here. So we want to get this bit of code, which is the reference where it's being used as an identifier expression as the child of a call expression. So here we're getting all the parents of our references, and we're filtering out all of those looking for call expressions. That just makes sure that we we don't catch any references that aren't call expressions. I think almost all of them, if not all of them are in the script, so it's not a big deal, but it's good to be safe. And we're going to replace all of those with the stringified result of the interpreter value executing those functions. So we need to JSON.stringify. So it's a string that contains a string, which is the return value of these functions. You're still with me. Basically, what we're getting are the decoded values of these strings. So we've done a lot without actually seeing the effect yet. Let's comment out these statements so we can see our transformations take effect before before we do too much magic. And then after that, we are printing out the refactored script to the response.body and returning the response. And bam, we are just about done. All right, we go automatically to our payment page. Let's check our resources. Here's our script. Still looks mostly the same, mostly because we didn't change those top two statements. But everything afterward, you'll notice that it no longer has those function calls anymore. And you see those computed member access, but blocks right there, those are those still JavaScript still works, but they're not as easily parsable by at least my brain. And now that we've translated all the function calls to strings, we no longer need them to be computed anymore, we can translate them back to static properties. So instead of blob square string, we can do blah dot string. Well, not string, but like the actual thing you know what I'm talking about. And this is where those functions that we we deleted come into play script dot convert computer to static is part of the common methods plugin. And the next two statements there just delete the two statements above our main script because we don't need them anymore. The strings have been decoded. And since all the strings have been decoded, we no longer need the decoder function. So let's get rid of those and clean up our on obfuscated script, the obfuscated script on on up on on the obfuscated. I don't know. So let's load up hacky and again, and check our script. And nothing's there. Oh, it is there. We actually we had scrolled down. We were too low. But now this is our script. This is this is basically very close to the original script, minus some still the identifiers that are still wonky, but not much we can do about that. But you can see what it's doing there. There's no sort of misdirection anymore. And it's real code. All right, that's, that's it. There's just a few lines of JavaScript that can just completely de obfuscate obfuscated JavaScript. Now I'm talking a lot about de obfuscation. But there is a law you can do with all the JavaScript across the web, even a site like Twitter.com has a huge array of user friendly intuitive methods that could just provide you with an API to all of Twitter. But it's hidden away in the JavaScript. So you can't touch it via the console or anything else. Now, all you have to do is find out where those beautiful little functions are, and just expose them to the global namespace so that you can access them via the console or via hack game scripts or whatever. And by, if you're not a JavaScript expert by exposing, exposing to the global namespace, all I'm saying is take a chunk of JavaScript and prepend to it like a window dot my exposed variable equals the stuff you want to expose. And that's it. Like you have an exposed API that you can access easily with hack game scripts, which then allows you to basically tweet via Node.js without using a developer API key or anything like that. Well, that's about all I can cover in this session. And we actually smashed a whole lot of content in a very, very short period of time. But because I was so excited, and there's still so much more to talk about. This is just the tip of the iceberg. And I just barely dove into what shift refactor and shift interpreter can do. And I didn't even talk about any plugins. And I didn't talk about Chrome extension API, talking with the Chrome DevTools protocol, and all sorts of stuff that I could just talk about for days or hours or so much time. I'm going to be putting more of this content on YouTube, because that's, I guess, how people consume content nowadays. And I'll probably write some stuff up. But if you if you like anything I've talked about, you can follow me on Twitter, it's at JS Overson, there's probably a link somewhere nearby. Actually, I think I'm should be talking to somebody in Discord. You talk to me. But I like this stuff. I like, I like, I like making the web do things that I want it to do. And this is why I built this stuff. And I hope you enjoy it. And thank you. Thank you, Absac Village. Thank you, Def Con. Thank you, all of you for watching this. This has been awesome. Thank you very much. Bye.