 Thank you, mistress. So, yeah, that all just happened. Where do we even go from here, really? All right. Let's get started with the talk. Let's talk about some computer security and not wooden pedals. All right. What's that? Who asked that? I can't even see who's that? I'm in pain, but thank you. Oh, it's the girl that spanked me. Yeah, okay. Cool. Oh, no, I think you got me good. Thank you, though. Yeah. No, I'm good. Yeah. Thanks. Appreciate it. That was great. So, that's like a little hidden perk that they don't tell you about for the speakers package. You get a nice cool badge. You get access to the speakers room and a little bit of ass-touching. Anyway, let's go ahead and get started. So, welcome everyone. Thanks a lot for coming to my talk. I hope everyone's enjoyed the conference so far. This talks on massive attacks with open source distributed computing. And, obviously, I'm going to tell you all, you know, what all those words mean together here in just a minute. So, I hope you guys enjoy it. So, who am I? So, just so you guys know who's up here talking at you. I'm Alejandro Caceras. You can call me Alex. I'm the owner, founder, pretty much everything of Hyperion Gray. Hyperion Gray is just a small R&D and open source startup. We're completely focused on the nexus between distributed computing and offensive security. So, I think there's huge potential in the field. And, hopefully, after this talk, you guys agree with me. So, I studied physics back in college and most of my research was focused on kind of distributed computing with scientific experiments. And now I'm really just hoping to branch out into breaking shit with that. So, that's where I'm at. I'm also the founder of the punk spider project. Has anybody here heard of the punk spider project? Oh, sweet. More than like five people. It's more than I expected. Awesome. So, I won't say too much about it because we are going to get into it here in just a couple slides. So, don't worry about it. So, this little background came up with this talk after I presented punk spider at Shmucon. Word got back to the CEO of my company at the time that I was building a cyber weapon, which punk spider is like a community-focused web application security project. So, that's like the most ridiculous thing I've ever heard. So, after laughing about that for a minute, I kind of got to thinking, you know, what would it take to actually build a distributed attack platform, right? So, different examples that I'm going to show you here today are just kind of what came out with tinkering with that idea. So, there's also three demos in this talk. It's really highly demo focused, so definitely stick around. It's going to be a lot of fun. My ass hurts, too. Anyway. So, let's get into it. To start off, distributed computing is really big right now, right? You've heard a lot about it. There's all kinds of IBM commercials and stuff like that. You hear big data a lot. It's a nice little buzzword. So, big reason for that is that we've seen some really cool stuff come out that makes distributed processing of things really, really easy. The short of it is that pretty much all folks are doing with this is lots of powerful analytics. So, kind of cool, right? Analytics are cool. We all like that. But I'm not really into that kind of thing. It bores me a little bit. So, I've been kind of looking for more interesting use cases for distributed computing. A couple of technologies that have come out are Apache Hadoop, which is an implementation of the MapReduce parallel programming concept. We're going to get all up in MapReduce here in just a few minutes. I won't go too far into that right now. So, you might ask, you know, if analytics bores you what exactly is some fun stuff that we can do with distributed computing? The answer, of course, there is massive attacks with open source distributed computing, which you might notice is the title of my talk. So, what's the high-level idea behind distributed attacks? What exactly do I mean when I say something like massive attacks, right? So, what I'm talking about here is conducting really well-known, often effective attacks, stuff that has a relatively high rate of success and then doing that hundreds or, I'm sorry, hundreds of thousands or even millions of times even in a really coordinated and effective manner. So, what I found in my research into this so far is that, and hopefully this isn't too much of a spoiler, is that you're going to break into so many things that part of the problem is going to be dealing with, like, what do I do with all this broken shit? What do I do with all this information from the stuff that I've broken? So, we're actually not going to get too far into, you know, what we do with that information afterwards. We're going to be more interested in the breaking of things, if you will. So, everybody with me so far? Cool. Some head nods? That guy's nodding. Cool. All right. So, let's define what we mean by a distributed attack. By this, I mean an attack that uses various computing resources in an effective and coordinated manner. So, why do we want to do this really? Why is this going to be to our advantage? Once the time required to attack a massive amount of things, and again, remember I'm talking about hundreds of thousands or millions of things all at once, is that it could take a really long time to do this. So, you don't want to be waiting months, even potentially a year or years for an attack to finish. It's not just annoying and just impractical, but it also allows for response teams to the particular targets that you're talking about to respond in lots of different and complex ways. So, we kind of want to bang out the attack, get in and get out sort of thing. So, just to give you an example, picture a target of, like, 250,000 web applications, for example, associated with a particular target, right? So, this could be, you know, every web application associated with a country, for example, which is... So, let's say you try to run just basic web app fuzzing followed by, like, an automated SQL injection kind of thing. So, with a really optimistic estimate doing this in a non-parallel way, it might take something like a minute per target, and again, that's pretty optimistic. That means you end up with something like 173 days, 174 days to actually finish that attack. We definitely don't want to wait that long for obvious reasons. So, if you think that target number is unrealistic, you've heard me mention punk spatter a couple of times already. We've done checks on about 1.3, 1.4 million sites so far, and our target is 250 million sites. So, it's a completely realistic target when you're talking about really, really large attacks. So, why else? Yeah, that's right. So, why else? So, sometimes you need a little bit of coordination between your computing resources. To illustrate this, again, picture a large-scale attack on a massive network, maybe like a fairly significant portion of the Internet, for example. And let's say that you realize that in order to conduct that attack on a large scale, you're going to need more computing power, right? So, like I said, we don't want this attack to take too long. So, in a non-coordinated manner, maybe you spin up some cloud servers, something like that, and you kind of just start a bunch of attacks, sort of in a dumb way, just how you'd expect. Maybe you have a little script that runs and executes a few shell commands on each machine, for example. So, you start running into a whole bunch of problems with that, right? If anybody's ever tried an attack like that, you know this. So, you might want to know, like, when an attack has actually finished on one of those nodes, right? So, once a node has finished its part in the attack, you've just freed up some computing resources. And in order to make your attack as efficient as possible, you're going to want to be able to run more stuff on that node. That's not really going to be possible in this way unless, you know, you could hack something out, but it's not going to be ideal. So, another issue that you run into is how do you actually make sure that your computing resources are kind of pushed to the limit? You might have lots of different types of servers. Maybe you're running this out of your basement somewhere on commodity hardware. So, how do you actually know that all of these resources are being pushed to the limit and that you're using everything that's available to you? You can hack out some kind of threading codes, something that monitors the resources on a particular machine and ensures that it's using all of them at once. But again, that's not going to be ideal or you're going to spend a significant amount of time on that. And we want to be able to do this relatively easily. So, if all of this just sounds kind of hard to you, there's been some really great advances in the field that make this actually not that hard to do, to basically solve every single problem associated with using large numbers of nodes to conduct a coordinated attack. I'm going to get into talking about some of these and then move right into the three examples and three demos that I talked about. So, for the most part, we're going to be talking about one of the best and most popular tools out there for distributed computing, which is Apache Hadoop. How many of you guys are familiar with Hadoop and know everything around it already? That's way more than I expected. All right. Cool. So, we do need to go over some background on what Hadoop is. So, just bear with me if you already know all of this. Coming from a scientific background myself, I've used a couple of different protocols for message passing, for distributed computing, like MPI, which mainly has support for Fortran and C. So, I myself had to deal with Fortran MPI application and it was actually a real pain in the ass and not something that I would ever, ever want to do again. It's even like Fortran 77, which is just ancient stuff. But if we get into how Hadoop works, which is through MapReduce, from what I'll show you if it's implemented right, which it is in Apache Hadoop, you get really, really easy to write code that can parallelize your tasks really quickly and you don't have to do that much work. So, I'm going to show you all about how you would do that. So, I've mentioned MapReduce a couple of times already, but what exactly is it, right? How many of you guys are familiar with MapReduce parallel programming concept? Cool. Awesome. Pretty good amount of people. Awesome. So, let's say you have a problem that you'd like to distribute across the node. This is how MapReduce works. What you would start out with is called a map function. So, I'm actually going to go very in-depth into what MapReduce is. It might appear a little bit confusing at first as to why we're doing things the way we are, but don't worry. There's a couple more slides on this that will illustrate all of that for you guys. And also a couple really good examples that make it simple. So, just bear with me if you don't get all of this all at once. So, first thing you do is you write a map function. Map function is really simple. It just takes in data as key value pairs and outputs a set of key value pairs as its result. So, map function is written in such that it's a single operation on a single key value pair for you. So, as the person writing it, you're just writing this for one input at a time. You don't have to worry about all that massive amount of data. You're writing it for one input record at a time only. This is automatically distributed across the cluster in Hadoop. This operation for each of your key value pairs. Each machine in the cluster has the map function and it has a set of key value pairs that it's responsible for, doing whatever operation it is that you'd like your map function to do. So, I like to think of the map step as a part that generates somewhat processed big data, if you will, in a distributed manner. It's usually not the solution to your problem, although sometimes it can be. So, it's pretty simple. All it is is input key value pairs, run a map function leveraging all the machines in the cluster, and then outputting key value pairs after that. So, pretty simple. After the map step is done, you move on to the reduced step. There can be some intermediate steps for some additional processing, but generally you would move to the reduced step. The input of the reduced step is really simply just the output of the previous map step. So, a partitioner is going to take the values from the map step with common keys and distribute them such that one node in the cluster is responsible for running the reducer function on all of the values with common keys. So, this is again distributed across the entire cluster. So, the reducer is usually the part that gives you the solution to the problem. And I know that was, like, a lot of words that I just said at you, and it might be a little bit confusing. There's a couple slides that might clarify this, if it's not completely clear to you yet. I'm actually going to shut up for a second and let you guys read through this for a minute. I'm going to get a little water, and then I'm going to read through it myself. And hopefully this will make things a little bit clearer along with the example after it. All right, so hopefully that was long enough, maybe not. But here's all that's happening just in summary of what map reduce is. You have key sub one, value sub one, inputs to a map or function, right? The map function is distributed across the cluster, yields a list of results as a key. So that can be, you know, something like key sub two, value sub two, key sub two, value sub three, key sub two, value sub four, and so on and so forth for each key value pair. All the values with the same key, key sub two are logically grouped together, and a reducer function would then be applied to this group in parallel so that for each group and then yield something in return. So these would usually be what we would call our results. So a common question when you're kind of first dealing with map reduce is, well, why do we do it that way? What's the use of having the values with the same key grouped together? And I'm going to show you exactly why we do that here in just a minute, just in the next slide. So a few things to keep in mind. Once you write a map or in a reducer, how do it's going to distribute it to the remote nodes and slaves automatically? So you don't actually have to deal with anything that's actually distributing things and dealing with when things happen or where things happen or why things happen in those places that they do happen. So Hadoop takes care of a lot of really important parts of distributed computing. Things like, like I mentioned, automated partitioning to remote nodes, automated assurance that the job's going to get done. So if for example, you have a node that goes down, Hadoop just very seamlessly sends it to another node and it's able to detect that. It actually takes it a step further from dealing with nodes that go down to actually expecting that nodes will go down. So you can run it on really shitty hardware. I do all the time and get really solid results from it. So what else? There's also a few configurations items that you can set in Hadoop that are really useful. So I mentioned before that you want to be able to push your resources to their absolute limit, right? You can do that very easily with Hadoop and just a couple of lines of configuration. You don't have to deal with going to each of your nodes and figuring out some kind of code to make sure your resources are all being pushed to their limit. Hadoop will actually do pretty much all of that for you with just a couple of computerization items. Really cool. All right, so let's get into a specific example. So first off, I have very few complaints about the distributed computing community in Apache Hadoop and all the really nice, useful community behind it. The one thing I could complain about is that if you look up MapReduce and just Google it, try to find some really simple examples of it, the only freaking thing you're ever going to find is a word count example. So that's really annoying because once you just start seeing the same example again, if you don't quite get it at first, you want to see another simple example that will kind of help you out with that. So it always seems to me like with Hadoop you're either reading a word count example or you have to pour through hundreds of lines of Java code to just figure out something really, really simple. Also, word counts are really, really boring. All it is is essentially it counts the instances of a word in a particular piece of text, so that's kind of lame. But I decided on a better example to give to you all. So this example is a tool called PunkScan. So PunkScan is a free open-source tool that HyperionGray released to distribute a web application fuzzer and it's what powers the PunkSpider project that I know I've mentioned a couple times and haven't said that much more about, but we'll get into it. So picture the situation where you have a list of URLs, you have a ton of URLs, potentially something like a few hundred thousand or even a million. So we want to be able to perform a MapReduce job in Hadoop to fuzz these URLs quickly and search for vulnerabilities on the pages. Another constraint that we're going to place on the job is that we want all the vulnerabilities associated with a particular domain to finish at the same time. So this is going to help you out with you don't want a bunch of disparate URLs being returned as your results, not really without any care to who they actually belong to. And this is where you're going to see that the way that MapReduce works by grouping the specific keys together during the Reduce step is going to help you a lot. So are we still good? Everybody's still good? Get some head nods for everybody? Cool. What's the job flow look like within something like PunkScan? So as I mentioned before, we start with the MapR step, inputting key value pairs. All we're doing here is we just care about a list of URLs in this case. So our input key is going to be none. So we don't really care about a key in this case. Our URL is going to be the value. So essentially what this does is it makes it just a dumb list, right? We're not associating any keys with the specific URLs that come in, not yet at least. So we're at a MapR which is going to be applied to each URL in parallel. Again, the MapR just essentially just fuzzes the URLs using a really simple fuzzing library that I wrote and then determines the domain of the URL. And that's it. That's all the MapR is going to do. So after that, it yields its output in which it's output which is going to be the domain of the URL fuzzed as the key and the list of vulnerabilities for that URL as the MapR, or I'm sorry, as the value. So any vulnerabilities that come out, add it to a list, you get the key being the domain, value being the list of vulnerabilities. So keep in mind that all of this is going to get distributed across the cluster for you. So the URLs that are going to be fuzzed in parallel, I'm sorry, the URLs are going to be fuzzed in parallel as much as possible. And all of that's handled completely in an automated way using Hadoop, so we don't really have to write any of that logic ourselves which is really, really useful. So now because the domain of the URLs fuzzed was the key of the MapR as well as the input of the reducer, right? So keep that in mind. Though the reducer function for each URL with a common domain is going to get sent to a single node for processing. So the reducer is going to run in parallel for each group of URLs with a common domain. And all of that, of course, is going to get distributed across the cluster as well. But what you're seeing already is that each domain is going to get handled by a particular node at a time in a specific reduced step. Now, why is that actually useful? The reducer function is just to combine, it just outputs a, so it does, sorry, all the reducer function does is combine the lists, I think that vodka is hitting me like right about now by the way. I'm like, all of that, all fucked up. All right. Anyway. So all the reducer function does is combine the list of vulnerable pages in the one big list for a specific domain. Then it's going to index them to a back-end search engine. In PunkSpider we're using Apache Solar as our back-end, which wasn't that tough a choice because we were writing a search engine, Apache Solar is a search engine back-end. Overall, that's pretty simple, right? But how easy is it to code, really? I keep mentioning coding up a mapper and a reducer, but it's still kind of abstract to you guys. Like, what does that look like? I keep mentioning it's easy, too, right? But what do I mean by easy? Do I mean like a hundred lines of code, 200 lines of code? What is it? I wanted to show you this. Don't worry about actually reading all of it and, you know, doing a thorough code review or anything like that. But just take a look at it. If you notice it's about 12 lines of actual code and it's written in Python. This one, this is our mapper right here, and up next is going to be our reducer, which our reducer is just like ridiculously simple. It's like six lines of code. You might have noticed a couple of things in the mapper and the reducer. First off, as I mentioned, they are written in Python. What we've done is use a function on Hadoop called Hadoop streaming that reads from standard in to standard out to partition and set up the job properly. I don't want to get too far into how exactly you would use that, but suffice to say it's just a bash one-liner to actually run a job in MapReduce after you've written your mapper and your reducer. So if you're the kind of person that really wants integrated details on how to run all this stuff, how to write mappers, reducers properly, specific to offensive security, follow me on Twitter. I'll be giving you all those handles in a bit or our blog where we're going to be posting all this shit really in detail. So if you want all that, definitely keep in touch. Another thing I wanted to point out is that the mapper and the reducers that I showed you are really the only part of punk scan that's distributed computing focused, right? So in other words, if you were to actually download punk scan which you can off a bit bucket, you noticed that most of it is actually pretty standard stuff, right? We're not doing anything too crazy to distribute this code. Essentially it's a standard fuzzing library that I've written, some solar indexing stuff, some other fairly simple things, but then you see also a mapper and reducer, which again is the only part of it that's really distributed code. What I'm really trying to get at here is that there's nothing really too mysterious about writing your own distributed computing focused code. It's all, if you understand the base concepts, you're really going to be able to write distributed attack code relatively easily. This guy's falling asleep, by the way, and that's killing me. So hopefully that'll prevent that from happening. What's that? Drink me or him? Yeah, drinking will help keep him from falling asleep, right? Great idea. All right, so demo time. I keep mentioning that this talk has a bunch of demos, but all I've been doing is talking at you, so we need to stop that. All right, so first off, first demo I'm going to show you, this is punk spider. Obviously, first thing we want to do here is read the banner. So we're providing a lot of vulnerability information on a bunch of sites that we don't own. So this banner is really important, and I do take it pretty seriously. The goal is to provide free information to website users and owners regarding website security status. So that means if you go on the site and look for vulnerabilities, what I'm really hoping for it to be used with is if you're a site owner or a site user, you want to know the vulnerability state of that site, right? So if you're out there giving your credit card number or any kind of personal information, you want to make sure that that's not being leaked all over the place. So that's really important. So a couple things you can see here. Can everybody see that okay? Does that come out all right over there? Perfect. So a couple things we can do here. We can search by a particular URL or by the title of a site. So we're just going to go ahead and search by a URL. Down here is where you specify the specific vulnerabilities that you'd like the sites to have that you're searching for, right? So we're going to go ahead and do any site with the search term that I type in with any of these types of vulnerabilities. And these are bind SQL, SQL, and cross-site scripting vulnerabilities. Actually, I'm going to do you one better. We're going to search for every single site that has vulnerabilities in it. So you can, it supports wild card characters so you can just go in, type a little star and you get absolutely every site that's in the database and it's going to be pretty long. So if we scroll down, you start seeing sites that are essentially a mess, right? These are vulnerable sites that if you were a user giving your personal information to any of these sites, you'd be pretty pissed off, right? Scroll down to the bottom, we actually see the number of pages of vulnerable sites. It's 6,166. Just to be clear on this, a lot of articles on PunkSpider got this wrong after we presented it at Shmukan. But this is 6,166 pages of vulnerable domains. So within each domain, we can have several vulnerable websites and vulnerable pages. We have 10 domains per page, so that's 61,660 vulnerable domains. And within each domain, if we go ahead and expand it, searching for one with more than one, within each page for each domain, we have several vulnerabilities. Anyway, long story short, what I'm trying to get at is there's a lot more vulnerabilities in here than 6,166. It's right up at about 300,000 or so, so far. And this was all made possible by using PunkScan. So as I mentioned, PunkScan is what powers this on the back end. And making it distributed over actually a relatively small how-to cluster and pushing our resources really, really hard is what allowed us to do. The main issue that we've had with PunkSpider is usually terms of service stuff. So we try to run this stuff on cloud servers and run it through a bunch of proxies and stuff like that. But I guess they have some kind of monitoring outbound a lot of the times. And we've gotten, yeah, long story short, we get kicked off with cloud providers like all the freaking time. It's really annoying. Does anybody work for a cloud provider here, by the way? Actually, I won't say too much more because we have cloud providers here, but anyway. One of the get down to actually showing you a specific record, you can actually sort of see, sort of picture the map-produced job running here if you look at one of these specific records, right? We're looking at ajaxa.cn. I have no idea what this idea is or what they do. Something with Ajax, maybe, I don't know. So you can actually see the parameters. You can actually see us attempting to do that. So here you see that this one's looking for, let me zoom in a little bit more, cut ID over here, and then we see it moving to the next parameter page over here, and then we just see it kind of moving down, right? So this is our map-step that I was talking about. We're essentially just taking a URL, iterating through the parameters, attempting a few basic, basic, really safe, by the way, injections, and reading the output. We're not doing anything used for good things and not bad things. Somebody's laughing over there for some reason. Anyway. So that's BunkSpider. And what made all of this possible, what allowed us to basically target the entire internet, is to distribute this job, right? This actually would not have been popular. I mean, we'd probably have like 10,000 sites done here if we hadn't been distributing this stuff, coordinate all this stuff in a really simple manner. So that's BunkSpider. What do you guys think of BunkSpider? Thank you, guys. All right. So I've shown you some stuff, and now I want to get into specific use cases of that. That was just an example to kind of whet your appetite, right? The rest of what you're going to see is me showing you or explaining demos. So we're going to cover three areas and see tools for this really, really quickly. Essentially, you just want to greatly speed up repetitive tasks, right? A lot of network or application reconnaissance on targets is just repetitive tasks when you're dealing with massive targets. So we're not getting into really low-level complex attacks here. We're getting into common stuff that succeeds a lot is our goal. The only thing I really did want to say about distributed recon is to figure your problem, right? Are you in need of CPU, memory, bandwidth? What exactly is it that you're trying to solve? So with punk scan, we had the issue that we just needed faster fuzzing, right? And we had to figure out what will help us fuzz faster? Are we going to need bandwidth? Are we going to need CPU, memory? And we actually had to do a little bit of pre-research in order to figure out the video. It goes into a lot more detail. The short of it is that CPU and memory were far, far more important than any kind of bandwidth. So this actually turned out to be really useful for us because distributing the job we knew was really going to help us. And it turned out it did. It helped us a ton. So just always consider your problem and be really careful before you write these things. All right, the next one is the really fun one. So just don't misuse Punkspider and don't attack the sites on Punkspider. That's really not what it was built for. And I would be kind of pissed if I found out that people were actually using it for that. But now we're going to look at what we could do with that type of information. If we were complete dicks. Well, why do we want to distribute the actual exploitation phase? So mostly because it's fun and that's a fun thing to do and we want to speed up our attack and we want to help us coordinate our resources. So demo an example of what I'm going to show you is a distributed version of SQL map. How many of you are familiar with SQL map? Essentially an automated database takeover and stealing tool kind of thing. Really, really cool tool. It was presented at DefCon I think like four years ago, something like that. That's probably completely all this stuff is the source code is going to be available online immediately after the conference. So definitely take a look at it if you want to know more about it. It's in the proof of concept phase right now. Not what I would call a real tool just yet. But if you're coming to Derbycon, I am going to be presenting on a really refined version of this that you can actually use and has a lot more features than what it has currently. So by example, Mr. Injector. So the reason for this is MR equals MapReduce. So injector because it injects it obviously. So MR Injector in my head turned into Mr. Injector, which I think is kind of funny, literally nobody else has ever thought that that was a funny name for anything, but that's kind of just how I work. Also in my head, I picture it as a cross between things that that's entertaining in any way. So we're just going to move on. So let me set the stage for you here. This is the next demo. The screen you're seeing is divided into two parts. So the left-hand side is SQLMap owning targets in a nondistributed manner. This is written kind of just how you would expect it. You have a simple Python or shell script that runs SQLMap on targets in a row. So you right-hand side uses distributed computing through a cluster to conduct the attack. So this is a real attack running on a testbed of servers. So even though it's a testbed of servers, this is an actual attack that we conducted. And what you're going to see is a series of, you're going to see those shells run, but you don't have to read that too much. Under them, you're going to see little red squares pop up each time a target has been owned. By owned, you can take a look and what I really want you to pay attention to is the rate at which these things attack. It'll actually be pretty obvious what I want you to look for. But again, this is not a simulation. We didn't just, you know, do a bunch of calculations to see if this would work. We actually ran this attack and recorded it to show to you guys. We're also kind of jumping in in the middle of the attack. The whole thing was just this is actually real-time. This is not sped up in any way or anything like that. It's real-time targets being owned, and you see that obviously the right side is much, much, much faster. Even though when I look at the left side, I'm always kind of pulling for it, right? I'm always like, all right, come on, little buddy, let's go. Hey, hey, there's another one. How many mappers? I believe we were running something like 10 mappers per node, 10 nodes. Yeah, so that's what it's running in parallel. So already you see that just with a relatively small cluster with 10 nodes, greatly, greatly speed up the attack, right? So it greatly speeds up the attack. That was 61 targets in 45 seconds. So we have under a second limit. And what makes this really possible, it's not just the fact that you have more computing resources. It's the fact that you're able to push those resources to their absolute limit with really simple code. You don't have to get into really complex stuff in order for that to happen. So my goal with this, what I really wanted to show you is that these techniques actually work. So maybe there was some skeptics out there that think, oh, well, Benwith is going to be your limiting code. But next of all, it actually works. So shut up, imaginary person. This is an example of the mapper that I wrote. Actually, this is a really, really early version of the mapper that I wrote. It's really simple, right? It's Python code. All we're doing is we're running a simple like sub-process stop you open, which just runs a shell command and replacing the URL with whatever input that we have, right? Really, well, what we've done is we've gone back and refined it a little bit with the help of my friend, Mark, who's right there in the red shirt. We've refined that a good amount, but this code actually works and runs really, really well. So as you can see, this is like ten lines or something like that. Really, really simple. All right, so the output of the tool gets output distributed across all the nodes. It's fully accessible on absolutely any node that you have out there. So you don't have to worry about what nodes you're on in order to retrieve the output. You can actually be on any one of your distributed nodes and just grab it from anywhere and you have that information just right at your disposal for whatever it is that you're into with that information. So it's really, really convenient. So what do we end up with, so we just own a large amount of targets and what was really cool is if only we had a really fast distributed password cracker. So I'm going to tell you about a really fast distributed password cracker that I wrote. We've connected reconnaissance on a bunch of attacks, right? We've exploited a massive number of targets and we've stolen a bunch of password hashes. So these hashes could take a long time to crack. Maybe we're using any kind of specialized hardware, right? We don't want anything that we don't want to have to go out and buy a bunch of GPUs or something like that. We just want to be able to click a few things and crack some hashes, right? So you might notice in the previous examples I made the assumption that you can build or have access to enough machines to actually run a how-to cluster. That's actually not that hard. For anybody who's running a couple hours. So it's really, really simple. Let's say that you're just really busy, you don't want to deal with all that. And what you want to do is you want to be able to click a few buttons and just have an instant cluster to use. So I'm going to show you how you can do that and then crack a password over the cluster by using HyperionGrey's custom-built tool which is called punk crack. So admittedly this wasn't an trivial. You actually have to worry about how exactly you're going to partition this stuff. To me when we started this it seemed really simple, right? Each operation you're hashing a string and then comparing it to another hash seems simple enough. It's easily parallelizable. I think that's the word. I don't know. And seems really easy to do. But what you run into is that Hadoop is expecting as any usable way, right? If we were to try to just like compute all the hashes and then input a list from a file that she would crash for any reasonable password, right? So it was a little bit complicated. We had to write our own little language that could represent a series of characters in order to distribute this job. I think I'm actually getting close to the end here, but what I'm going to show you is spinning up a powerful cluster over Amazon's Elastic Map cluster to run this job and get me on my way sort of thing. Really cool. Oh, and yeah, last thing. I know there's lots of ways to crack passwords. I'm not claiming this is the best way, the fastest way, the most efficient way, anything like that. Just saying it's an option and something that you can have in your tool belt, if you don't mind spending some money for convenience for a cracker, this is a really good technique. So let me see. It's actually a really long video. What's going on here? Let me go ahead and talk you through this. I start out... Okay. There we go. So I start out by showing a really screwed up screen. There we go. So I go here. I just... Can everybody see that okay by the way? Sort of. I'm going to walk you through it anyway. Don't worry about it. I'll walk you through it. It's okay. Full screen? How do you full screen in Windows Media Player? This little thing? Yeah, that didn't help it all the time. It didn't work for anything. So all I'm doing is I'm specifying a new what's called a job cell. This is essentially just setting your basic job configurations. I copy a few things. All I'm doing is I'm telling it the location of the jar and a few basic arguments on the jar. I wanted to skip forward. This is a really, really freaking cool screen. So what's going on here? I'm specifying my instance types and my instance numbers. So I'm telling it how large do I have a master node which is a pretty big machine on Amazon's EC2. You have for my one slave machine I set a cluster compute 8 extra large which is a 32 processor machine which is pretty big. And then at the bottom over here where you see the 17, I'm setting it to again really large. So I have about 19 nodes here. I really wanted to show you this demo with some extra zeroes. So that would be 170 or even 1700. And you could pretty much have a password like that. But that does get a little bit expensive and you have to be careful on how you use that because you need special permission from Amazon and they already kind of hated me. Is there anybody from Amazon here? No? Okay. Amazon. Yeah. They have some really powerful stuff and some really cool stuff but they hate me. So all we're doing here is we're configuring the node to what? Sorry. Got to skip around a little bit. Anyway, so what we're doing here, if you see down here where it says one bootstrap action created, what I did was specify one minute. Okay. What I did was specify one particular action to do on this across the cluster before your job actually starts. What I did there was I set the number of mapper tasks. So that's the number of parallel tasks that occur on each node and that's what I'm talking about where I say you can push your resources to their absolute limit with integration items. Long story short, because I'm running out of time, in case you can't predict what's going to happen, you crack the hash and it's done. And you did that in a completely distributed manner pretty quickly. And that's punk crack. Okay. All right. Anyway, hope you guys have enjoyed it. No, you can't finish yet. No. No. Give it up. Wait, you need my time as well. Yeah, I need your time as well. Just scan there. Oh, okay. Rebecca, are you a first time speaker at DEF CON? I am. You are? I am, but I presented earlier today. Oh, right. You were the guy with the thing. Right, okay. We're not here for you because we already shot you. However, we learned Rebecca. Rebecca, please come up. Did anybody see Rebecca's talk? Come on. Rebecca did not do a shot. Right. So we're going to fix that right now. Rebecca is going to start a new tradition. She is going to take Tylenol with her shot. That's awesome. I had a shot last night. It didn't go over so well this morning. No. No. Don't touch it. You're not done yet. All right. Thank you. All right. Here's to Rebecca. Thank you. Now you can finish. Thank you. Thank you. Now you can finish. Thank you. All right. Go. Sorry to interrupt. Oh, you're fine. And thanks for coming. It's like the fourth shot I've had to do today because of this whole thing. And we're out of time. God, you missed. I'll give you another minute. Thanks, man. We hazed you enough. Thanks. I appreciate it. Yeah, I was thinking it was worth at least a minute. Anyway, so I definitely enjoyed this whole thing. Short of extremely. God, I'm freaking hammered at this point. When, when, when you need to run massive attacks, one more shot. Can I do another one then take another minute? What do you got to do? Considering I'm not going to have time to go home and shower before I get on an airplane, the person, the people on both sides of me on the Southwest flight are going to love me. We're actually out of liquor. We're out of liquor. More booze. Somebody. No, he's not you again. I'm actually mixing the vodka with the with the rump. It's disgusting. And I'm doing this for you. You're welcome. All right, cheers. Cheers. So distributed computing. So where do you even go from here? I mean, what do I even do? Where am I? So to have to drink, somebody say something. So definitely enjoyed proving the concept to you here, but what exactly does this mean for you? So leveraging distributed computing from an offensive perspective lets you run really powerful, massive attack scenarios and all using open source technologies, commodity hardware, shit that you can just say to your friends, I need a bunch of hardware, give me your old shit and run it on there. So really, really cool stuff. Imagine pentesting massive targets with this. We can't do this in country would be awesome. So if anybody wants to tire me to do that, definitely done for it. So I really think that the security implications of this are broad. So if we can feasibly simulate a massive attack scenario, we can better study it and better prepare for it and see what exactly that's going to mean for massive targets like an entire country. So follow me on Twitter, dot slash punk. I'll answer all your questions, anything, anything. And if you want to know more about us and check out some more details on the presentation at www.hyperiengrade.com, check out punk spider, check out our blog. I don't even know what that last one says. And thanks to everybody, Tomas, who's a dude, when I say we write this and we write that, it's usually Tomas. If it's not Tomas, it's Mark, which again is that guy right there. Thanks to Amanda, my girlfriend, thanks a lot.