 Hey, everyone. Thank you for making this your last Defticon talk for this year. I really appreciate it. I'll try not to disappoint. My name is Matt Weir. I'm a PhD student at Florida State University, and let's go ahead and talk about some password cracking here. So before I begin, I just want to say I'm going to be using the word we a whole lot, and that's not because I have some British royalty complex or anything like that, but because I work with a really good team, and I just want to acknowledge I'm right now. So my major professor, Dr. Sudhir Agrawal, has been very hands-on on this, as well as Professor Brenno, and they've helped me out quite a bit. Also, I would like to thank the National Institute of Justice, and along with the National White Collar Crime Center for funding this research, which I really do appreciate because that way you don't have to teach intro to C-classes, but also for providing us a little bit of legitimacy too, because when you have to go to your research ethics board that you have to get this stuff approved with, it really does help BL go and say, you know, we're doing some password cracking research, and they're like, and then you can say, but we're doing it to help catch and prosecute child molesters, and that kind of just gets us right on through there. So as I just kind of mentioned there, the main focus of my research is to help law enforcement. That's what we're really focusing on here, and I'll freely admit, because law enforcement is dealing with strong encryption, and quite honestly, there's no way I'm going to go ahead and be able to break AES. It's just not something I'm going to be able to do. I just don't have that brain power to do it. So really what we're focusing on instead is instead of attacking encryption, which is an extremely hard, if not almost impossible problem, we're going to be attacking the user, because that's much more doable. So what we're really trying to do is go ahead and create better models of how people create passwords in real life, and then use those models to go ahead and attack password creation strategies. To do this, the first thing we need, of course, is to have an idea of what people create passwords with. So we spend quite a bit of time just going out on the internet and trying to find any disclosed password lists that are out there. So these are lists that have just kind of popped up on the internet for whatever reasons, like a hacker will break into a site, they'll post a password list online, saying, ha ha, look how great I am, and we definitely go ahead and try to grab those and analyze them. Also there's a lot of password cracking forums and bulletin boards and stuff like that, and we definitely go ahead and fish those as well. Now, occasionally those passwords are in plain text, and that's great, because all we need to do is just go ahead and read them and we're done. But the vast majority of these passwords here are hashed, and this is a real chicken-the-egg problem for us, because in order to figure out how people create these passwords, we actually have to crack them first. But that's actually kind of a good thing for us, because it does force us to focus on the practical. I mean, we just can't sit back, you know, sip wine, write research papers. We have to go ahead and try to apply what we're talking about to real password lists and see if it works. Now, on the flip side of this here, what's really been interesting to me, too, is how does this apply to the defender? You know, how can we go ahead and make password creation strategies better based upon our knowledge of cracking passwords? And that's actually a much harder problem, in my opinion. It's much easier to break something than it is to go ahead and fix it. So I'm going to talk to you a little bit about that as well. Now, for this talk here, I'm really going to try to avoid focusing too much in detail on the tools or trivia, because if I try to spend the next 50 minutes just going over every single command line option of John Ripper, or maybe going through pages upon pages of letter frequency analysis, or, you know, how many people use our sports teams in their password, that's not really a good use of your time there. I mean, that's why, you know, we invent the AirNet. So I do maintain a password cracking blog, and I do keep all my tools online as well, and I occasionally try to, you know, document them. So please go to this here if you are more interested in this comment. If there's anything that you'd like us to look at, please let us know. Like, if you think, hey, you're doing this stupid, this is a better way. We really do appreciate those type of comments as well. Just as long as you include that, this is a better way portion of that there. I also, on the slides, try to include links to all the different web pages for other tools that we talk about as well. So when something flashes up on the screen, you have to frantically write it down. You can just go ahead and download the slides there. And I also included quite a bit of the tools on the CD itself. Now, real quick, just about a couple of different tools there. We do have, like, our dictionary-based rainbow tables online. We have a lot of the password cracking scripts that we run, some of the different rule sets that we use, some of the parsing scripts that we use to analyze password lists, and a couple of different custom word lists that we made as well. One of the most custom word lists, probably the one that's most notable, is we probably have the largest collection of one-line ASCII art porn on the internet. Yeah, my parents are really proud of me, as you can tell there. But, I mean, people create passwords that way, so you have to go ahead and attack that. Okay, also, I'm not going to go ahead and focus on, like, oh my god, passwords suck. You know, users choose horrible passwords, we're all doomed. Okay, because I don't believe that. Users choose horrible passwords, but just avoid the rest of that there. I mean, yeah, I could spend the entire time making fun of users, because they're hilarious. But, really, I mean, that's not that useful. I mean, we've known for years, you know, fire is hot, water is wet. Given the chance, a user will choose the password, password 123. Okay, I mean, that's not new knowledge. I think as a security community, you don't need to figure out what we're going to do about that, though. Also, I won't say that I'm optimistic, but I'm not overly pessimistic about the future of passwords as well. I mean, we're stuck with them, we're going to be using them forever, because they are really nice, and there's a lot of different factors of them that just blow everything else out of the water, with the exception of security. But, I mean, password cracking is getting hard, because people are actually starting to use strong encryption. They're starting to use stronger password hashes and treat those passwords better. And I think that's actually why we're seeing such an explosion in password cracking research right now, because Landman's going out. I mean, Windows 7 hopefully will put the stake in the heart of XP there. We're starting to see less and less web even though that's all over the place, too. So, right now, I mean, we are getting to the point where you actually do have to work at cracking passwords instead of just, you know, trivially break everything. And that's actually an improvement. So, if I'm not going to go ahead and talk about tools, trivia, or make fun of users, I mean, what am I going to do? So, really, what I kind of want to focus on is, what does a password cracking session really look like? What are the real techniques that we're doing? How do we start out? How do we go ahead and go about trying to attack these lists here? And in particular, what I really want to do is just focus on two different case studies. So, we've been collecting a lot of different passwords from everywhere. But these two ones were pretty cool, because they're a very big list and they're very public. So, we're going to talk about cracking both the PHP, BB, dot com list, and then our results was cracking the web hosting talk, dot com list as well. And I'm going to start this off just by doing some a little bit of kind of a password cracking overview. And it's not like a CSSP prep course, but just mostly so we can agree on some basic, you know, language and terminology. And then, because I'm probably going to be running almost over then, I'll definitely be available in the breakout room afterwards for questions. We are starting to do some research on TrueCrypt. It's nothing advanced and we are in the beginning parts there. I can talk about some passwords and non-standard passwords and some different ways we're looking at attacking that. And just and so on. So, please come to the QA room, grill me. I'd love to hear what everyone has to think about this type of stuff. So, moving right along. Now, there's really two different types of password cracking and they are very different. The first type is online. So, this is what you see in all the different movies. You don't have access to the site at all. And what you're trying to do is gain access to the site. So, you're trying different username and password combinations to try to log in. Now, the main thing about this is that the defender has a lot of different options available to them to make the attackers life hard, which is really good because otherwise we'd be screwed. But the main thing is the attacker or the defender can really limit the amount of guesses that you can make. What we're really focusing on in our research though since we are worried about cops kicking down the door is the offline password cracking research. So, this is more of a forensic study. So, the cops obtained a warrant hopefully first and then kicked down the door, seized a hard drive, found true cryptic crypt of the hard drive there. And now, they are really only limited by the amount of computer horsepower and time they can throw at the problem. For these two different case studies though, what really happened was an attacker broke into a website. They managed to go ahead and download all of the passwords and usernames. And now, the attacker really is only limited by the amount of time they can throw at the problem. Now, of course the question is, you know, why is the attacker bothering to crack the passwords to gain, you know, full root access to the site? And the answer, of course, is people use the same password everywhere. So, you want to go ahead and gain access to all these users now, you know, Webmail accounts, bank accounts, PayPal or whatever. Now, we don't do this. I mean, I stress. I really do want us to... We never verify these passwords. So, even though we cracked them, we never try to log into the sites. We never try to log into any other sites. And I'm just saying this because there's a lot of feds in the audience. But it really is true too. Okay, so there's really three different steps in cracking passwords. I don't want to differentiate these because I'll be referring back to them later. Because I've run into problems with each of these three different stages here. The first stage, and first of all, all the talking I am going to be doing is going to be based on offline password cracking. So, the forensic setting. So, you've already attained the password hash, or you've got the, you know, full hard drive encrypted drive there. So, the first step you have to do, though, is just make a guess of what the user created their password as. So, of course, the first one is going to be like password 123. The second stage is you've got to go ahead and hash to guess using whatever hashing algorithm there is. Now, ideally, this is where you'd be spending the vast majority of your time. But a lot of different sites just use like simple MD5 hash, which is very quick. So, this goes very much faster than this slide even. But also, when you're dealing with full hard drive encryption, what you're really doing is you're just taking the user's password and turning it into an encryption key in order to decrypt the file header. But it's pretty much the same thing. Step three is you go ahead and compare it against the target hash and match. And if they do match, you've cracked the password. This actually gets much more difficult, though, when you're dealing with a larger password list as it takes more time to do the comparisons. Now, a vast majority of the time, though, it doesn't match. So, you go ahead and you throw out all the results and you start again. And that's password cracking in a nutshell. It's just these three different things repeated over and over and over again until you grow bored and quit. Now, you mentioned password salts quite a bit as well. Mostly, when I'm talking about the web hosting talk list, I'm talking about an Assaulted List. And basically, all Assault is, is just a value added to people's passwords so that even though two people might choose the exact same password, which happens quite a bit, too, their password hashes are going to be different. And that's really important. It helps improve security quite a bit. And if you're not using a password salt, you really do need to, especially on any online passwords that you're storing. So, but that's it. It's not magic. Now, people have a tendency to want to go ahead and use the username as the password salt, Microsoft, but that's a really bad idea because a password salt also helps protect against a hash lookup attacks. So, that's where an attacker goes in and generates all the password hashes at one time and then when they need to crack a password, they just do a lookup on it and it's very quick. If you have a password salt, though, they can't do that unless they generate a hash lookup table for every single password salt. Now, with your Microsoft domain logins, they are stored with password salt of your username. But in that case there, an attacker might not want to generate a table for Bob in accounting. But there's this user named Administrator. He or she is really popular. I mean, they're everywhere. People really trust him or her. So, that's just kind of a fail right there. Now, a couple of important parts about the password salt, though. First of all, it's not secret. It's really good if you keep it secret, but the security of it is not required. The secret part is the user's password in the first place. Second, the user doesn't need to know it. It's not something that the user ever has to see. It's stored on the server. So, that way, you don't have to worry about it. As far as the user is concerned, the password salt doesn't exist. It should be unique per user and I'll get into this more, but it provides kind of a herd protection when someone is trying to crack a lot of passwords. And we'll see that to a pretty much extreme degree there when we're talking about the web hosting talk list. And finally, it's not some magical win button. It doesn't provide perfect security no matter what everyone tells you it seems like online. Basically, if the attacker is really only trying to target one user in particular instead of trying to target a whole bunch, all the password salt does is protect against hash lookup attacks. But all the other attacks the attacker has are still definitely open. So, you still have to make the password hash computationally expensive in order to limit the number of guesses an attacker can make. So, that wasn't too painful, was it? Let's get on to the cracking. Okay, so everyone always asks me what my hardware setup is. So, this is my home computer where I started doing all the cracking because that way I didn't have to go into the lab, which was wonderful. But it's nothing crazy. It's about a two-year-old computer. I like it. But it's not like some major massive password cracking machine. I also use this MacBook quite a bit as well. So, two computers. Unfortunately, a couple weeks after that, the power bill came in and it had gone up about 75 percent. Now, there were a lot of other factors that caused the power bill to go up. I'm very convinced of this. But it's really hard to have that conversation with your roommate when your computer fans are just blaring in the background and they've been doing that continuously nonstop for the last couple of weeks. And also it kind of hurt that the power bill kind of went back to normal after I stopped. So, yeah, I do pretty much all my password cracking now from our lab computer at school, which I'd previously been using to generate dictionary-based rainbow tables. But when you hear about people talking about, you know, racks of, you know, PlayStation 3s or, you know, botnets cracking passwords and stuff like that, we don't have that. I mean, dude, we have a Dell. So... Really, when you're looking at the threat modeling, when I say something's hard, I'm meaning it from this context right here. So, an attacker definitely can throw a lot more resources at this. So, the phpbb.com list, I'm going to refer to it as a php list from now on, so I don't stumble over myself. It was hacked back in January 14th and the list was posted online early February. And when I say the list was posted online, the attacker posted a ton of information about the site, password, hashes, user names, user email accounts, and also a really interesting write-up of how they did the attack themselves, which is pretty good reading. Now, the list itself contained about 259,000 unsalted MD5 password hashes with associated users. It also had 83,000 salted hashes using the phpbb3 hashing algorithm, and basically what happened was they upgraded their site, the new version of it used a stronger hashing algorithm, but if anyone hadn't logged in since they upgraded the site, they were still stuck with the old hash. Now, we only attacked MD5 hashes here since we are doing this from a research-type setting, and tacking an assaulted list is a pain in the butt, so we don't really learn that much from them. So, we only really focused on the MD5 password lists here, and that's great to hear as a defender, okay? You saw your password list is too much trouble, the person just goes on to something easier. But I want to stress that once again, if your list gets disclosed and assaulted, you still have to treat it as a serious event because it might be worth a while for an attacker to go ahead and attack that, and I'll get into that with the web hosting talk list. Now, one other interesting thing is that the attacker actually tried to crack a lot of these passwords himself or herself. The hacker submitted about 117,000 of them to an online password cracker to go ahead and crack them. And they did, and this took them about one to two week period, just to submit all of them. And of that, they cracked 28,000 passwords, or about 24% of those passwords. And he posted those crack passwords online as well. So, when you see people doing analysis of the PHP list there, they're doing analysis generally of that 24% that the attacker cracked, which, as you can tell, is probably the weakest 24% of all of them. And I have to say, though, that the attacker was pretty much par for the course for his attack from what we've seen. There's a great and wonderful site called hashkiller.com. Most of us in German, but Babelfish is getting much better. But what they do is not only do they have their own online password cracker, not only do they have, like, online password cracking forums and stuff like that, but they keep track of how efficient all of the other password crackers are out there. And most online password crackers average about 20 to 40% the crack rate. So, if all the password hashes are submitted to them, they'll crack about 20 to 40% of them. There's actually a tool out there called MD5 Utilities, which was done by some of the people on the remote exploit board that will submit a password, list of password hashes to all of the sites, pretty much. It's like 33 sites, I think, at the current count. And taken as aggregate, all these online password crackers do pretty good. What I want to warn you, though, about this is that there are some serious privacy concerns with doing this. Because if you don't think that these people are going ahead and collecting and keeping the password hashes you submit to them, you're a very trusting person. On the plus side, this opens up a couple of different honey net options as well. Moving on. Now, there's a lot, if you want to crack the passwords yourself, though, there's a lot of different password crackers out there because, as you can imagine, this has been a problem for quite a while. Really, of all of them, though, I have to recommend John the Ripper. First of all, it's free and that's wonderful. But the main reason, though, and the reason why it beats out most even paid-for password crackers, is that it's been around forever and the source code's available. So, basically, if you can think of a problem, someone's already implemented probably the solution in John the Ripper. It's like the frickin' Simpsons of password cracking. So, I mean, that's really good. Also, it has this one option, and I haven't really seen this in any of the other major password crackers, and this option makes it worthwhile for me. And it's called STDN. And, basically, what this says is that going back to, you know, the introductory slides there, I don't want to have to deal with all of the, you know, hashing and cracking and stuff like that from stages two and three, but I don't like the built-in John Ripper rule sets. So, what I can do is I can go ahead and write my own program and then pipe all the guesses directly into John the Ripper. And John the Ripper will go ahead and hash them and try to crack the password. And you can code it up. You can use it with John the Ripper. And that's extremely powerful. Now, I got to kind of admit this here. I was an idiot, and I know that's hard to believe, but older versions of John the Ripper really choke when you feed a large password list. It just takes forever to make a guess because that comparison portion there, it was implemented pretty poorly. Now, there was a patch. It was released back in January. There was actually a... It was based on a patch someone else had done that's been out for a couple of years, and I didn't realize this. So, I went through a lot of pain dealing with an old version of John the Ripper to the point where I actually even started using K and Able a little bit, which I'm still ashamed of. But really, all I have to say is that John the Ripper is still being updated, still being maintained. Definitely use a new version. And the online mailing list is actually very active, which surprises living daylights out of me. I was having this John the Ripper though. Here's kind of the timeline of the cracking attack. Four hours, and I have four hours in quotes because it took me a lot longer to go ahead and parse the password hashes out and do all that other stuff there. But once I started, after four hours I cracked 38% of the passwords, and we're going to see that that's really slow, actually. After one week, I'd cracked 62% of the passwords. After one month and one week, I'd cracked 89%, which is when I kind of submitted this DEF CON talk. And currently, I'm at 95% of the total passwords cracked and one of the unique hashes. And the numbers are different because a lot of people choose the same password there. So that's pretty bad, actually. Now, and I'm not the only person that's obtaining the success here. I mean, after my DEF CON talk with Post, I've talked to a couple of different people. For example, Brandon Enright has also cracked about 95% of the MD5 hashes there. He cracked 2500 that I missed and I cracked 2600 that he missed. And we haven't exchanged the plain text password list just to do the privacy issues, but we were just exchanging like these are the ones I cracked. Oh, these are the ones I cracked, okay. And I actually got an email from another person just the other day and he says he cracked 97% of them. And I believe him because we're still cracking passwords and definitely I'm not the most elite password cracker out there. I mean, I'm just a student trying to, you know, graduate. So once again, there's probably a lot better ways to do this. So take everything I say with a grain of salt. Now the reason we cracked this many though wasn't because I'm some elite hacker. It's because in general those passwords are really poor. The average length of all the passwords they cracked was 7.2 characters long. And we kind of see that almost universally across different sets. What was kind of surprising was this list though, was only 6% of them contained an uppercase letter and only about 1% of them contained a special character. And most all of them, if they did contain something else, it was numbers and a lot of them did contain quite a few numbers though. But 51% of the passwords only contain lowercase letters which makes it a lot easier to go ahead and crack. And I just want to stress this doesn't count the 5% of passwords we haven't cracked. I'm going to assume they're a little bit stronger though just for my ego. So once I really want to stress with these password crack concessions though is that we do have limited resources. I mean, we can't just brute force everything. And what that means is that we have to kind of have an idea in our head of how to use or create that password and that means that there's a lot of different password creation strategies, a lot of different types of passwords. We're just not going to try because we don't have that time to do it. And you just kind of have to accept that. So what we're trying to do is crack the majority of passwords, not all of the passwords. And you really just have to have that mindset when you're doing password cracking. And for the defender that can be pretty good there. I mean, it's easy for an individual or it actually is not that easy. But yeah, it's easier for an individual to create a strong password that you can't brute force it and then do something that no one else does. And then we're just not going to go ahead and try that attack because you're the only person doing that. It's really hard though from an organizational perspective though to make everyone unique. We just don't do that. As human beings, we can't do different. It's just not really built into us. And that's why password cracking really is probably going to keep on working pretty well there. Now the first type of attack I'm sure you all know about this one is just a dictionary attack. And I just want to stress when I say I not only mean using an input dictionary, but also all of the associated word-mangling rules that you do to it, like adding two numbers to the end or capitalizing the first letter. Now there's two main reasons why a dictionary attack can fail. First of all, you just didn't try to write dictionary word. So your password was zebra123 and then zebra wasn't in your input dictionary. The other way is that you just didn't try to write word-mangling rule. So in that case, you just didn't try the word adding 123 to the end of your password. And you really have to balance them because you're always limited in the amount of time and the amount of guesses you can make. So the bigger your input dictionary, the less word-mangling rules you can apply to it. If you want to apply more word-mangling rules, you have to use a smaller input dictionary. And what a lot of people do for this, though, is that they do run multiple input dictionaries. So they'll run a really small input dictionary and have tons of word-mangling rules applied to that and a much larger input dictionary, but only try a few word-mangling rules on that as well. Now this isn't a little controversial because I disagree with cracking, but it seems like when people start cracking passwords, the first thing to do is try to collect as many input dictionaries as possible. And it really kind of falls into this crazy cat-lady syndrome there, where instead of just having a couple well-maintained input dictionaries, they just have like 50 or 60 of them running around the house pooping all over the place. So I really do believe, though, is if you do get these input dictionaries and you can have some that are extremely large, you really just have to keep in the back of your mind how are you going to use this and how does this fit into your cracking attack? And it really, you do spend a lot of time trying to maintain these input dictionaries, but it does pay to focus on just a few. Now that being said, though, if there's no password creation rule in a forest, just kind of like with this PHP list here, people aren't using word-mangling rules. They hate using, people hate using word-mangling rules. If you give them a chance, they'll just go ahead and type in password and be done with it. So in that case, if the chances are if you don't crack a password, the reason is that you're not just trying, you're just not trying to write base word. So the bigger the input dictionary you have, the better off you are. Now probably one of the best big input dictionaries I've ever seen, though, was done by Sebastian Raview, and I know I said his name wrong and I really do apologize. And what he did was he downloaded Wikipedia, all of it, and all of the sister projects as well. And he creates a massive word list based upon that, because I guarantee if someone knows of a word, they'll know it. So that's an extremely good one. But it's ginormous, so you really can't apply a lot of different word-mangling rules to it. But it is great for catching those lower-hanging passwords there. And the dictionary is available on his website there, and I can't give him enough props for that. Now if there is a password creation policy, or you are getting into the upper levels where you're trying to crack stronger passwords, though, you do have to use a smaller input dictionary. And by far the best of those input dictionaries are based upon previously cracked passwords. I really do apologize, but this is the one type of dictionary that I can't release to you guys. Just because of all different privacy issues with that. But it really is kind of like a rich gets richer type of thing, where the more password cracking you do, the better you get at it, just simply because these lists grow on you here. We're actually starting to see a lot of the same users in different lists where they were part of different compromised websites. And that's not actually the user's fault, it's just that he got unlucky. But yeah, people use the same password all over the place. What's also really useful is to go ahead and extract the base word from these passwords here. For example, the password Tiger Woods 1982 might not be that common. But people use Tiger Woods in their password all of the time. So you want to go ahead and extract that. And I included some tools on the CD that allows you to go ahead and do that much easier. But those work really well. Now for the word-mangling rules, the default ones in John Ripper are really nice. But they only crack weak passwords. And they're designed to crack weak passwords really well. But if you're trying to crack any type of strong password, it's just not going to work. You have to make your own word-mangling rules for that. I made some of my older John Ripper rules available online for download. The only other ones I've ever been able to find online are posted by Minga and he posted them on the John Ripper mailing list. But yeah, you have to make your own rules if you really want to go ahead and crack strong passwords. One thing I've been working on though is trying to identify new rules, just things that we didn't think about. So another tool I included on CD there goes ahead and tries to parse a password list into two different sets. Passwords that we already know how they were created, like password 1, 2, 3. And then passwords that it does have no idea how they were created. And that's really nice because you don't want to have to look through 200,000 instances of like password 1, 2, 3 when trying to figure out how to find new rules. So that provides you a very small list where you can go, hey, look, you know, people use emoticons, you know, smiley faces in their passwords, or you identify new keyboard combinations and makes it much, much easier to do that. Now, as I said, I've been moving away from using the John Ripper's built-in rules and working on customizing some of our own ways of generating guesses. And probably what we've had the most success with is our probabilistic password cracker. And whenever I mention this to somebody, the reaction is universal. The guys kind of glaze over. They kind of nod, you know, nicely there. And they go, well, that sounds very interesting. I have to make bullshit. But, you know, what input dictionaries are using and stuff like that. And they try to move the conversation along. And I have to say, this is something that I really believe in strongly. I mean, I've been drinking to Kool-Aid, and we've been using this, and it just keeps on getting better and better. And really what we're trying to do here is you hear about different optimizations at a time. You know, there are certain words that are used more often than others, like password, monkey, and football. Certain word-mangling rules are used more often. Number one, two, three, 007. People tend to capitalize the first letter more than anything else, and so on. The problem, though, is when you're creating these password-cracking rules, it becomes very hard kind of on an ad hoc basis to take advantage of all these optimizations here. Because you have to have a rule, like try all the numbers, you know, try all dates at the end and you have to have another rule that says try all four-digit numbers after a password, except for those dates that you tried beforehand. And this gets really pain in the butt, especially when you have multiple word-mangling rules or multiple optimizations that you're doing. Probably the most extreme example of this I saw was one person on the John Ripper mailing list mentioned they had over a 12,000-rule John Ripper config file. Just trying to take advantage of all that. Quite honestly, I'm lazy. I don't want to have to do that. So really what this probabilistic password to do is just to automate this for us. So, I mean, and what this is really nice is what we're doing is we're assigning a probability to a password guest based upon all the different information that we know about this. And this allows us to go ahead and rank password guesses in probability order. And this allows us to switch between different rule-mangling rules. So, I mean, we can determine whether we want to try, you know, a very common password with a very uncommon word-mangling rule before we want to try, you know, maybe a common word with a very uncommon word with a very common word-mangling rule and vice versa. And we've been having a ton of success with this here. So, we kind of break this into two different parts. One, we have a training program that automatically parses a known password list and generates what we call a grammar that has all this information in it. And then we have the actual password cracker itself that uses these grammars to go ahead and attack passwords. And as I said, we assign probability to just about everything. The probability that certain dictionary words are going to be used, the probability that certain word-mangling rules, like adding three numbers to the end of a password are being used, and the specific replacements, two-digit numbers used is a number 12 versus a number 21. Now, I could keep on kind of talking about this, but what I really want to do is have something blow up in front of everybody here, which is probably what's going to happen. So, we're going to do a live demo. Now, I don't expect you to read all this here, but basically what I'm going to be doing is I'm going to run our probabilistic password cracker. And I'm going to be feeding it two different input dictionaries. It actually supports up to ten if you really want to get crazy there. One input dictionary just contains common passwords. So, I'm going to try to add a higher probability, since they're very common. And I'm going to try a much larger input dictionary of less common words. And this password cracker is going to switch between the two of them. What I'm going to do after that, though, is I'm going to pipe it right into John Ripper, because as I said, I don't want to create my own password hashing algorithms, stuff like that. And unfortunately, we can't use the PHP list there since, once again, due to user privacy issues. So, I'm going to be doing crack passwords in front of an entire hacking audience that might not be appreciated by our school and legal department. So, what we're going to be doing is running it against my space list. This is a list that was disclosed about two years ago. And the thing about that is all the passwords were just in plain text. So, you could read them, so they've already been disclosed. Those users have already pretty much been hosed by now. So, we're not really putting them in any additional danger. But what we did, though, since cracking plain text passwords is pretty pointless, is we went ahead and we just hashed them using MD5, which is the exact same hashing algorithm that the PHP list was. So, this pretty much simulates the PHP attack here. And it's going to crack about 17,000 passwords. And we actually split the password lists up into two different parts just for typical machine learning when you collect these. So, we have a training list and a cracking list that we use. And now you can see it's going off right now, and these are all the passwords that are being cracked right now. And you can kind of see it's going a little bit fast. It's a slower hashing algorithm. But it's switching between different rules. So, it's not just trying, you know, try one digit and try two digits and so on. It's trying, you know, some other distance, some exclamation marks in number seven and periods, you know, and it's going between this. And it actually takes into account the password lengths as well, because people tend to, you know, make passwords that are six, seven, or eight characters long. There's actually a lot of different other optimizations that we're looking at building into it, because this is definitely a work in progress. One thing that we're looking at doing is incorporating, you know, targeted-based attacks. So, instead of trying to crack a big old password list, you're trying to attack one person. Well, all you need to do is just create another input dictionary of, like, their kid's names, their names, you know, burst days, and so on. And then you just try those at a higher probability than all the other passwords there. So, that way, you'll still be trying all these common passwords, but also try stuff targeted specifically to them. Now, you also notice it is starting to get a little bit slower there, because it's kind of by design. We'd love it if it just cracked everything immediately and went really fast. But the thing is that it is based upon a kind of probability model. So, initially, it cracks all passwords really quickly, because they're really, you know, easy to crack passwords. But as it keeps on going on here, it's trying more and more, you know, or less probable password guesses here. So, it takes more and more password guesses for it to crack a new password. Now, one other thing is that, just based on our grammar here, so, I mean, you can just kind of let it go, and it doesn't really run out of rules there, because it has, you know, it generates actually literally trillions of different rules that it can use. And most of them it will never, ever, ever get to in any sort of password crack concession that you would have. Also, you notice that these are still very, you know, weak passwords here. And that's the reason is that we're just starting to pass a crack concession. So, it's not going to try, you know, really complicated or advanced passwords until much later. Now, if you do have, like, a password creation strategy, though, where you have to only attack strong passwords, the way you deal with this here is that you just create your training list based upon only strong passwords. And that way it will start off by trying only strong passwords in the crack concession. So, I think that's pretty good there. So, let me go ahead and stop this. Okay, so it ran for about two minutes and 30 seconds, and we cracked a little bit over 30% last year. This is actually one of, like, about a hundred different reasons, too, why I think that password change policies are retarded and stupid. Not in that order, maybe. But it's... I'm really... I think they do more damage to security than they actually help to tell you the truth. And especially when you're talking about offline attacks, unless you're changing your password every two minutes, it really doesn't do much. Okay, now, brute force, everyone kind of puts this down. It's good of all different attack techniques. And the thing is it's really, really powerful. But you just have to use a little bit of brains on that. And we'll talk real quickly about a couple of the brief optimizations here. The very basic thing, though, is at the very least, you should be doing layer frequency analysis. If you're not, you're probably using K and Able in picking your nose. But... I mean, the... And you can actually really get fancy with layer frequency analysis. I'm sure everyone's heard about it. For example, though, you can train on previous passwords. You can have different frequency analysis for the beginning and end of a password, because people tend to put numbers in the end and capitalize the first letter and so on. Whereas also extremely useful, though, is trying to figure out what layer is not to try. For example, Q doesn't occur that much. So why bother trying this in your brute force algorithm? Especially since you're not trying to crack every password. You're just trying to crack a majority of the passwords. And with number letter Q, you could probably be trying to attack either keyboard combinations like QWERTY or dictionary-based attacks anyway. So you might as well drop that, and some of the other less frequent ones as well. Now, a much more advanced option, though, is Markov models. And what this does is it takes the conditional probability of letters into account. The scrabble example here is if you have the letter Q, you're pretty much hosed unless you have a letter U in your set as well, because U always follows Q. And it all does. It says if you're giving this letter here, what's the next letter that follows it? And it's extremely powerful via brute force, words that look like something that a human created. And John Ripper uses an extremely powerful version that is here that actually not only takes into account the next letter, but the letter after that as well. And it runs extremely fast. I'm just really impressed by that. I know I'm trying to sell John Ripper here, but I apologize for that. But it really is a good program. There's a lot more logic on top of that in using the user letter frequency or Markov models as well. And we kind of just call this a targeted brute force attack. And this is where you basically take the input of a brute force method and apply basic dictionary word-magnet rules to the whole password guest as a whole. So instead of trying to uppercase letters everywhere in your password, why don't you only try uppercase letters as the very first character? Instead of trying numbers everywhere in your password, you can do it at the very end. And you have just different rules in this just like you would have a dictionary-based attack. Also, you can take external knowledge into this account as well. For this list here, people use PHP, BB a lot in their passwords. Well, I mean, if I already know what five characters of your password are, it makes it a lot easier to brute force. And there's actually a tool out there called Crunch that does a very good job with this. It's on the programming forum of the remote exploit board there. It's a brute force attack. I'm not going to run this, but just to show you trying to attack a strong password with brute force. So the first thing I'm going to do is I'm going to run John Ripper's incremental mode because I don't want to have to go ahead and write my own Markov models. And it's just going to go ahead and generate the guesses based upon lowercase letters. And these are actually the letters that John Ripper will go ahead and generate the very first letters or the words they'll generate using this incremental mode. And if you read these, these look a lot like Starless or Dog or Star or Marine. And that's the wonderful of using Markov models. It's because it does generate things that look like they're in a dictionary. But it also generates words that are not in a dictionary, but someone to definitely use, like s-tech. So it's really, really nice. Now, this still won't attack a strong password though. So for that, we need to apply some additional logic. So I just go ahead and pipe that into a script of mine. And all this script does is it just capitalizes the first letter, adds a special care to the end, and adds a digit to the end, in this particular attack. And now when we look at this, it looks like a real strong password here. Any of these will generally break a password creation policy. And I probably should get off my lazy button and spend five minutes to add something that will reject words that are a certain size too, unless they're too small. But now we can go ahead and attack passwords that most people would consider strong passwords. And then we just go ahead and pipe this all right back into John Ripper, since I don't want to have to deal with that, and just let John Ripper deal with all the password cracking there. Okay, so that's enough of the PHP BB list though. Let's go ahead and talk a little bit about the web hosting talk list. It was originally hacked in March 21st, or at least that's when the list was disclosed. And the attack was a real asshole. Really, there's no other way to put it there. But what they did was they posted over 200,000 user names passwords and a lot of good information there online. And then proceeded to go ahead and delete the website and delete all the backups as well, which really sucked for the system administrators there. Now, normally, as I said before, I don't go ahead and audit salted password lists. They're just too much of a pain. There's better lists for me to go ahead and try to crack with my Dell computer. So the reason why I went ahead and tried to attack this list though was a response once they got their site up on operational. Because they actually had to tell us users that they were hacked because it's pretty hard to cover that up. So they gave a little post about they've been hacked, don't worry, we're secure now and all that stuff. But they gave a piece of advice that I thought was really misleading and dangerous. And they said, and I quote, passwords are hashed with assault. It would be an unprecedented event to reverse engineer our passwords. I changed my password periodically though, so maybe today is a good day for that. I mean, that's like me saying, hey, guys, you know, the copy room is on fire. But don't worry, you know, there's a sprinkler system here. There's no way this building is going to burn down. But, you know, occasionally I like to go out for lunch, so maybe you should just maybe walk outside the building if you feel like it. And that's why I decided to go ahead and crack this list here. Now, they also posted online too, absolutely no credit card or PayPal data was compromised. And so, apparently, the attacker had full access to their site, deleted everything and stuff like that. But that PayPal and credit card data, you know, who wants that? So, fail. Now, what was even bigger fail though, was after they got back it up in operational, a couple weeks later, the site was hacked by the same exact guy over again. And they posted all the username passwords online as well. And they also posted, you know, credit card numbers. And I won't even go into this, but yeah, the site was saying like, well, the only compromised 2000 credit card numbers, the rest we think are still safe. And I just really want to kind of state for the record here, I mean, people get hacked. I'm not giving them a hard time for that. I mean, as we've seen this last Wednesday here, even everyone in the security community seems to be getting hacked as well. Also, getting someone out of your system is really hard to do, especially when you have users yelling at you to get your system back up in operational right now. So what I'm going to do with the web post and talk is the way to handle disclosure of data to users. Because as I said, I felt that was really dangerous of them, because you do at least have to give users a sense that there is a ticking clock. They have to go out and change your passwords, not only for this site, but all of their other sites as quickly as possible. And that's really important to do. Now, one interesting fact that I just want to point out here is that if you compare the two lists from when they were posted online, only 1348 users had actually changed your password in a couple of weeks period from when the site got online after it had been cracked, and when it was cracked the second time there. And that's less than 1% of the total users. That's 0.6% to be exact. So this kind of gives us a little bit of an idea of the window of opportunity for an attacker to go ahead and break into the site or break these accounts and use those accounts against the site itself, or other sites as well. So the next thing, though, was to figure out whether this hash was really unbreakable or not. So I needed to go ahead and figure out what, you know, forum software they were using. So one Google search later, it was like the second option there, it told me that, yeah, they used VBulletin forum. And our Google search went ahead and told me that the VBulletin hash there was just two rounds of MD5. So they take the user's password, they hash it once with MD5, they append a salt to that, and then with MD5. So as I said, really, it's only two rounds of MD5 to go ahead and do this. And I couldn't figure out that option in John Ripper to do this. So I actually had my own password cracker and I implemented it in there. But the big question, though, was did they do anything special? Would I spend the next couple of weeks trying to crack these passwords, only to find out that I'd implemented it wrong, which happens all the time. So in order to test this, I tried the password, password, and ran it through it. And of that, I cracked immediately 1,109 people. So, yeah, pwned. But I mean, what about that password salt? And I have to say, it really is a serious problem for an attacker, which is great for the defender and why I keep on hyping this password salt here. Because every single user's password salt was different. And what that meant was that for every single guess I make, I have to go ahead and hash that guess for each individual user. And what that means is, if it took me one hour to go ahead and try to run an attack against the PHP BB list, which was just 1,005 hash, it would take me over, take me around 200,000 hours to run that against this web hosting talk list here. Because I'd have to run, as I said, hash against each of those 200,000 users. And I got a Dell computer. So that's a serious, serious issue. That being said, I still managed to crack one of them in about a week and a half time. And a vast majority of these here was just going ahead and trying lists of previously cracked passwords. In fact, almost all of these, that's all I did. And actually, this was probably more effective to do this on this list here than you'd most likely find. Because there's a significant overlap in the number of people that are interested in forum software like phpbb.com and are interested in web hosting as well. So there was quite a few people that we saw the exact same password in these two different sites here, which really helped us out a lot. But what I really want to stress here is because when I was just reading through all the different posts about this, there's a lot of people that would post their hash online. I'd say, look, I saw my hash in here, but don't worry about it, I chose a strong password. He has a salt. No one's ever going to go ahead and crack this. And the thing is I really want to stress to people that their salt does not protect individual users. So don't be an idiot and post your password hash online. If I was a real attacker, and I was doing this for malicious means here, I wouldn't go ahead and try to crack every single person's password. I wouldn't try all 200,000 hashes to be doing this. What I'm going to do is I'm going to look for all the people with admin or webmaster in their email address. And I'm going to attack those guys specifically because there's a good chance that they'll use the same password on their site. Or at least I'll get a little bit of information about them to attack their site more effectively. And if I'm only trying a couple of people, two rounds of MD5. In fact, for a couple of people, you can get down to one there because you only have to do the first round once for everybody. So I can launch some very advanced attacks against specific users. And that's what you really need to keep in mind here. So even though the password list is salted, you do need to treat it with care there. Cool. I actually got done in time, too. Thanks for coming to DEF CON. I wish you had a great time. I'll be available for questions answered afterwards.