 Hello, and welcome to Cracking Beyond 15 Characters for under $500. Some alternative titles for this talk were Greener Pastures over the Computational Wall, or why XKCD Advice is easy to understand and hard to do. That last one was suggested by my management, and we'll talk more about that one later. Hi, I'm Travis Palmer. I'm also a nerd, so I go by the student of Tramco. I am an OSCP, OSCE, and GMOD, if those letters mean something to you, and as is traditional for speakers, a couple of my favorite hobbies outside of security are on the right, D&D, ARMA 3, and BWC. Yes, again, I am a nerd. And these slides arguably have too much on them, so this is the point where I dish my ugly mug so that we can happily reveal that picture from the Red Team Village last year. Yeah, let's just say that room was pretty tight. I'm glad we're not trying to do that again this year. Everybody would have COVID by now, if we were trying to be doing the slide. For this talk, we're going to start with a fair bit of background before we get into the meat of things. I should give a disclaimer, this isn't going to be an all-inclusive introductory talk. There is just too much to get through, and if you've never heard of Hashcap before and are thinking, where can I adopt one, this talk is going to be a little rough to follow. Just stick around. You'll still learn something, and you'll definitely have some keywords to Google later. So, let's address what should always be the first question. Why? Well, large web corporations are still using character requirements that were determined in 1985 by the Department of Defense to be resistant, only resistant, to online brute force attacks over a 1,200-baud modem, and even then they should have been rotated every couple months. They also have complex requirements that should seem familiar because they have become pervasive and are largely responsible for a lot of people, honestly believing that they can take almost any English word and add a number and an exclamation point to it, and it makes it an acceptable password. Spoiler, these policies in isolation have not magically become more secure since 1985 and are still guarding some very sensitive things. Thankfully, most of these websites and companies have some additional safety nets like detecting questionable login origins or 2FA, well, except Wikipedia, which, well, doesn't. And, historically, might have had one of the worst password policies on the internet. Thank you, Troy Hunt, for using a megaphone to broadcast the history here, which... I just... why? Your fix for zero character passwords was a one character requirement. There's also a large number of places where that 8 character standard, well, isn't, which frankly is a little baffling given some of the sensitivity here. PCI, the payment card industry, which only covers the systems where the payment card info is stored, this just seems lackluster. And on the other side of things, imagine what kind of information someone could collect for getting in your Facebook or Pornhub account. Imagine what they could do to your reputation by controlling it. And then there's eBay, which I shouldn't need to explain the level of financial ruin an auction site can bring if somebody else is bidding for you. Well, it's Fargo, deserving punchbag perhaps, but they just haven't wise end up. That's a 14 character limit. And is there seriously a database with plain text credentials limited to 14 characters in the background somewhere? Then there's Netflix. Does it make sense why hijacked Netflix accounts are so commonplace? I understand the want to make four numbered pins usable, but really? Finally, we have a whole suite of devices in tech, and I'll take on Cisco because I love them so much, that historically, by default, have no password and the policy might get set by the people installing it. But you know, 8 characters suggested. This seems like a deeply questionable recommendation when, again, those passwords were determined to only be resistant to attacks on networks in 1985. And while I'm on the trend of bashing trustworthy sources, it's only proper to talk about NIST, or the National Institute of Science and Technology, and what they say, given that what they say often seeps into government regulation. The special publication 800-63B released in 2017 said that the requirement should actually be 8 plus characters with no complexity because someone did the statistical analysis and found out that people actually produce easier-to-proof passwords under complexity requirements, like adding a number and an exclamation point. They also say the maximum should at least be 64 characters. Antivore things are good and might be going. There's also a key section here that says increased password length is a key security control and to encourage past phrases. You hear that? The future is coming, slowly, and the 80s might finally end soon. The most recent guidance on February this year, mind you, this was not the first time it was published from NIST and the FBI, is that, well, a lot more explicit. 15 characters without other complexity requirements is where we should be going. In fact, we should require it as soon as possible. And this is in line with what a bunch of other experts and sources that aren't the largest tech companies have been saying for much, much longer. In fact, a security consultant group that I greatly respect, Black Hills Information Security, makes this suggestion to their clients and uses a more paranoid standard of 20 characters internally. Neat, though you might be asking, okay, why is Black Hills specifically on here? Well, it's because I'm picking on them and we're about to listen to some snippets of audio from a webcast last December. Mind you, this is a little cut up and out of context, but don't worry, most of the context is still there. Seven characters is just easy to crack. How easy? If you go to the next slide, my son's computer, he's got a gaming computer. It takes eight minutes to crack an LM hash. If you take the same 14 character password, it would take 4.3 billion years. Oh, we don't talk. This is kind of what I'm talking about with regards to the password policy that they set in 1985. I mean, in 1985, an eight character password policy was secure forever. Because, or, you know, 90 days. 90 days, okay. So it was secure for long enough for them to be, we don't need to do anything. But, I mean, as technology changes, so do the password policies need to. Hey, this is warfare, right? Everything is move, counter move. We have electronic counter measures. You have electronic counter counter measures. And then it goes up above that and I lose track. So we're getting questions about, have you run the Google comment word attack against passphrases? So, like, if you just load words into a cracker, then doesn't that crack passphrases? Sure. Yes, it does. How many combinations of four word passwords are there? So how long is that going to take you? What if you start using words from a foreign dictionary? What if you start putting salt into those words? But I just, I think if four words, and if you start using special characters as a spacer between your words and things like this, the question is, are you more secure than you were with eight? But yes, there are attacks against everything. So, so Darren kind of summarized this up, this from the field kind of our results that I just did want to talk about it. We have no success with people who have 15 character passwords, approaches zero. And we have tried these things. Now, we're constrained in our time. Normally, we don't have more than a week or so to do our password guessing. Whereas attacker, according to Verizon report, they've got like, what, nine months, something like that? Okay, another question, Jason. Yes, so this question's come up a few times. Are spaces legal? Yes, they are a special character and they are very good. And it depends on the... And I'm going to cut it there. So they have some pretty solid reasons, a couple of point of recommendation. But perhaps the most interesting thing mentioned is that they have been given hashes from a domain with a 15 character passphrase policy, specifically to crack them. And the amount they can crack in a week approaches zero, basically no success. There's also that mention of spaces are a special character. We'll get back around to that one. They are a character, certainly. Any case approaching zero success. Pack it in. Presentation open. That's all, folks. No, you're not buying that? Well, good. Because neither am I. And I should address the final reason I'm here presenting on this topic in particular, which is the CISO of my company. But so getting on a password policy, so taking that innovative incubator mindset and looking at passwords. So passwords are something that's highly audited. Because it's very easy to do. If you're an auditor and you're trying to assess whether somebody's passing a muster, you can have this rule set and you can walk in and either day pass or they fail. And you'll see a lot of that. And a lot of the companies have this password policy to come straight out of audits. Eight characters and three or four of uppercase, lowercase. Special characters and numbers. I just rattle it off. It's been almost 20 years of that. So we looked at that, though. And as we went through the red teaming, we found that one step in what we call the kill chain, if a attacker were successful, and we actually give them a leg up. We actually bring hackers on and bring them in as fake employees. We give them a laptop and a password and they start. And we found one step in a kill chain was taking all the passwords and cracking them. And they would run computers with graphic processing unit, augmentation, and they'd run for hours. And they'd be able to crack a few passwords and hopefully they'd find a privileged account. And then they were off to the races with that. So while that was only one step in what I call the kill chain, we just said, we want to win that battle. And here we are compliant, but it's just not getting it done. Just meeting compliance. So we ran some math on the whiteboard and we said, well, what if we went out to a really long password, 15 characters, but got rid of all the complexity requirements? And I have a lot of people with high math SAT scores on my team. Let's put it that way. So we had some great whiteboard battles and how much better would it be and that sort of thing. But what kept creeping in there because of kind of that tick box mentality was, well, we have to have complexity. Everybody's expecting that. We have to have uppercase and lowercase and on and on. But we said, well, let's just try it out. So the math held and we felt pretty good about it. And then around that time, the National Institute of Standards and Technology, NIST, released an updated standard. And they said, well, length is king. And if you can get a longer password, you can get rid of some of that stuff as long as you look for commonly used passwords and block them out. So long story short, we did it. And it took about 90 days to roll it in. And the impact has been substantial. And the only reason we're able to do that is because we innovated and created this thing that we called the Kraken, that every single day tries to crack all the passwords in the company. And we see the success rate of that machine just dropping precipitously. We saw it. And I'm going to cut it there. So yeah, that's out in the public domain. And we need to talk about that math, Jerry mentioned. See, the math might hold up, but the equation isn't realistic. There's more than one way of approaching this problem, and I've heard the argument spun multiple different ways, the first of which is it's inconceivable to crack a 15 character password because brute forcing through all the possible numbers and lower case letters is 1.66 billion combinations. Another argument I've heard that comes from a better standpoint is assuming the passphrase is made up of words. There are a lot of words around five characters or more, so you can say a 15 character passphrase is probably three words, which is extremely difficult. You'd have to go through more than 50 quadrillion options to be sure. Okay, big number, but certainly smaller than the last one. And the last one I've heard, perhaps the most educated of these strawman arguments, is based on the common requirement of eight characters doubling that gets you just above the 15 character requirement. So maybe we should think about passphrases as just multiple passwords combined, in which case the lowest margin for cracking all of the combinations of two passphrases is just the combinations of two common passwords combined, which if we're using the rock you dictionary is still over 205 trillion, which they'll tell you is unreasonable. Well, I'm here to tell you that all of these arguments that say that cracking past 15 characters, regardless of reasoning, can be undermined, and they can be undermined with only three factors. What are those factors you might ask? Are you just teasing us? Is this a timeshare scheme in disguise? No, the big secret weakness is humans. Inconceivable! I know. Specifically, humans are bad at big numbers, humans pack complexity on the ends, the ends of passwords or passphrases, and humans pick similar things, similar things to other humans. Common things are common because they're commonly picked. Before I get too far, we should define some limits on what is reasonable, and I should address the other part of the clickbaity title of this talk. Why the $500 limit? Well, besides it being a nice round number that is easy to pitch, it's also the round of it cost of an entertainment system, and you can safely assume that every type of threat actor you might need to be worried about, including LullSec and the ScriptKitties, has access to this amount of money. It's also an amount that can pack a sizeable punch regardless of the pass you take, either owning a cracking rig or renting out cloud computing. $500 is more than enough to go off and build a small cracking system out of a GPU from eBay and some outdated desktop parts, which you might have found in a dumpster, which should met you the theoretical max of 53 gigasches against NTLM. As for the cloud, in AWS, $500 will get you 68 hours on a spot instance, which, yes, might get preempted, probably won't though, with 8 V100 GPUs which you can do manually or you can use a management tool like Coalfire's NPK to manage the spot instances for you, for both scale and automation of the attack. If you're an attacker and what you're planning on doing is a series of short-running attacks, instead of waiting 68 hours to get all the results, why not just spend up 68 instances and get the results in one hour? Sadly, NPK doesn't work with Google Cloud, which actually appears to be cheaper, and if you squeeze out everything you can, you can get 83 hours of time on an instance with 8 V100 GPUs, which, besides being terrifically cheap, brings us to a terrifying theoretical max of 189.3 peta-hashes of NTLM we can guess, not mega, not giga, not tera, peta, 189 quadrillion. If we compare that to that last strawman argument of the Rocku dictionary on top of itself, sure, theoretical isn't real performance, but the difference isn't off by a factor of a thousand times, although humans are bad with big numbers. Which brings us around to the second strawman argument. I'd like to throw in an alternative equation that is still very conservative for how many guesses it takes to crack a three-word passphrase, all of the words that are an obsolete to the third power divided by the number of people in the organization or number of hashes an attacker has to crack, all of which is divided by the factor of human laziness squared. How do you quantify human laziness, you might ask? Don't ask me, but trust me, the value of that variable always seems to be greater than one. This is a somewhat joking equation, but the very conservative hypothesis here is that people making a passphrase will use words from languages they know and can think of under the duress of password creation, which tends to be a much more limited set. In fact, I'm going to say now and support later that the pool to pick from for a lot of people seems to be between 32,000 and 64,000 words, which brings me to a point where we are going to need to break out a lot of math, because under minor number one, people are bad at big numbers. Thankfully, people are good at spotting trends after they have all the information, so let me give you something that is easier to parse or hopefully easier to parse. This is a chart of the actual computational limits of combining words together and an attack on a single 20Ti on NTL and hashes, which I picked because it's consumer GPU and I have access to one. The horizontal axis of this chart is the number of words in a passphrase. The vertical axis is the size of the dictionary or perhaps a more useful way to think about it, the rarest word in a passphrase that can be cracked, because a logical attacker is going to use the list of words by frequency. The bigger that list is, the rarer the words are going to be that are in it. Now, all the numbers in the colored boxes are the amount of minutes required to complete an attack search space, and the coloration tells you how long a time that is relative to minutes, hours, days, weeks, months or years, because I'm not going to lie, I don't know off the top of my head how many minutes are in a week. Going down or to the right, exponentially increases the amount of work needed, and there's a lot of spots here where the jumps and difficulty are sharp, and this is where the idea of a computational wall comes in. A seemingly sudden increase in difficulty that you can't go over. But of course, this is only in the chart for a consumer GPU. Well, what about a V100? Well, it's not actually that much different. There are a lot of reasons why V100 should be much better for cracking generally, but it isn't a variable for this particular chart. Anyway, what if we have more than one GPU? Why not eight? Well, the chart is going to shift down a little bit and open up some options. Those of you that have noticed the size of the numbers we're dealing with on the vertical axis probably already have some nightmares to take home, but I'm going to make sure everyone else has them too. In the two-word column, that is more than 16 million words in 20 minutes. Basically, every word actually in use from every language spoken by at least 1% of the world's population fits in here with room to spare for a couple extra million common passwords. In the three-word column, that is every non-office elite word in the English language and it's testable in a matter of hours. In the forward column, we have a sizable chunk of commonly used words where a target attacks, say, from scraping the memos and websites if the target makes sense. And before I get too excited, I'm sure someone is thinking, okay, cool, that's just NTLM. What about a slower, actually secure hash, or quote-unquote secure hash? Well, here's that same chart for shop 512 units packs raises. And you'll notice there are still some viable attacks on two and three-word pass raises. And here's B-crypt or Blowfish configured with the Unix defaults, albeit this is some pretty weak Unix defaults, I think Boon 2 has gone well past this now. In any case, we lose a lot of capability, but even when that difficulty curve is ramped way the heck up, we still have options in the two-word pass raise category. As an attacker, this chart makes me really sad, but the reality of most corporations using Windows somewhere in infrastructure is that there will be somewhere, something using a hash that, like NTLM, isn't going away for quite a while to come. And as far as individual users that are concerned with what they need to have as a pass phrase, well, they need to make sure what they have isn't going to get cracked when the hashes get dumped, either inside an institution or when a website gets all of its password dumped by SQL injection. Let's be real, a lot of major websites and companies keep on getting caught with their pants down. And we only find out they're using a fast hash or plain text the hard way. Not to mention the difficulty of computing a hash is a linear factor in a world where computing power increases exponentially. So I'm going to go back to the other chart, because it's time to get into the attacker mindset and play a game of bad recommendations. First up is Google, and yes, the advice is old, but the advice, much like their policy, hasn't been updated in more than a decade. So I love sandwiches. It's not a great example, and that mix of Leedspeak in case shifting doesn't do much for the difficulty. I'll be generous and say that it makes about 100 times harder. And in terms of the rarity, this slide shows the various words fall on the Google's own most searched list. That's the G number and the Wikipedia frequency list, which be the W number. The fact of the matter here is the underlying phrase is crackable in under three minutes. And even giving the benefit of the doubt and the difficulty of guessing Leedspeak substitutions, that's crackable in 244 minutes. I should probably also mention the recommendation put forth in the absolute latest NIST and FBI recommendation because they suggested the passphrase, voice is protected 2020 we are, and then suggest the passphrase that is even better, at least according to them, which is director month learn truck, because the words are unrelated. Well, those words might seem unrelated, but they all have one thing in common and that's they're all eye-searingly common. The rarest word in there is truck and who boy, that isn't the top 4,000 regardless of which list you choose. Four common words are not safe from offline cracking, which does bring up another, perhaps more viral recommendation from XKCD 936. Now, a lot of people have used this as password advice, including the management intercontinental exchange, and the matter of it is the math and XKCD advice as written is actually fine, because it was written to handle an online attack where the rate of guessing is only a thousand a second. Now, there's a claim here that the average user shouldn't have to worry about attempts to crack a stolen hash, and I'm here to tell you from experience, the average user reuses passwords. The average user also works at a company, and if that company is large in a couple dozen people, they should also worry about stolen or dumbed hashes, both from their own infrastructure and the websites their users are using both professionally and personally. Four common words isn't safe from offline attacks. If anything, XKCD, the example here is a little stronger than intended, because Staple isn't actually that common, it's number 11,363 on the Wikipedia frequency list and completely off the end of the Google list. Then there's the level entropy, XKCD says is a common word. If we say common words, are words that are on the first 2048 of any given word list, and the attackers, well they're gonna win all the way out to six word phrases. Yes, really six, like good luck diceware users. This is nasty that this is not how this, there is no way out of this computationally with just throwing on common words. Mind you, there are some caveats. The real world isn't as simple as an Excel sheet, so before we get into real world results and the mechanics of attacks, we should very quickly cover the things that are gonna be ever present when attacking these passwords using a GPU. Yeah, this is gonna be real technical real quick. First of which is the dictionaries and the size of the dictionaries. There's a lot of bandwidths to go around for operations within a GPU in its memory, but transferring data to the GPU, well PCI Gen 3 by 16 is only 16 gigabytes per second, max theoretical, not real world, and that might seem like a lot, but when we're dealing with 16 billion theoretical bytes per second, modern GPUs can do 26 billion real hashes of a passphrase per second, and one of these numbers doesn't fit within the other, and also is real, not theoretical. That PCI bandwidth is not going to match up with the hash rates. The other limitation to be mindful in terms of what kind of attacks are possible is the nature of CUDA or the MV Equivalent. I'm not gonna get into it. Core is in the GPU and what they can do, compute unified divide architecture, which is, well, not a lot actually. They were built to do lots of single instruction, multiple data computations on matrices and vectors for graphics, which means lots and lots of units they can do arithmetic, ALUs, and not a lot of space or silicon devoted to controlling them. The nature of single instruction, multiple data means groups or in this diagram rows of ALUs all need to be doing the same type of thing at the same time, and those instructions and a significant portion of the data needs to fit in a smaller shared cache, and oh right, the instructions better be simple and something it can actually do if that something isn't exactly implemented in a hashcat or John the Ripper, often the best bet is to find a workaround. Also, thank you, Hashcat and John the Ripper devs for making it easy to use these devices for cracking. Okay, let's talk about techniques and results. First, dictionary attacks, basically checking through a massive list of passwords in a file with or without GPU generating some additional candidates to check, though moron rules later. Intercontinental exchange, we've been running a dictionary attack during the period that Jerry mentioned and that's soundbite, and while we rolled out the 15 character policy and as you can see in this graph, the average length of a cracked password, the policy wasn't just successful, it caused some rapid jumps in length when it started to be enforced and after implantation, the average length even doubt despite some outlying spikes, but this is the less interesting half of the story. This is the graph of the percentage of passwords that were cracked in by dictionary attacks, only dictionary attacks. Yes, that number in the upper left is 50%. Dictionary attacks are very good when you have a dictionary of passwords made of the same password policy that you're targeting, and yes, dictionaries also aren't very good after people changed their password under a completely different and significantly changed policy and start picking things that aren't going to be in the dictionary anymore. In fact, if we overlay the two graphs, we see multiple downward shifts in the number of cracked passwords even after the average of cracked passwords is above 15 characters as further enforcement for cracking cracked passwords was rolled out until it settled down and around the 1% range. Mind you, it didn't approach 0%. It held down near a certain percentage. There is always somebody picking and repicking a terrible password and these attacks are trimmed and don't show anything that has happened recently, but I sure you straight dictionary attacks have not had much success, though I was hired around this time. So let's talk hominator attacks, attack centered around combining different dictionaries or the same dictionary results into itself or sorry, the same dictionary onto itself. The attack here is pretty straightforward and we've kind of already talked about it so let's get right into the results. And well, here's a chart of every passphrase password longer than 15 characters ever cracked in a continental exchange that was crackable with a small dictionary of words or common passwords organized to reflect the charts from the computational wall part of the presentation a little bit earlier. In order to do this, I had to modify an algorithm for dissecting passwords called ZXUBN to be able to actually properly handle common separators like spaces between passwords and a passphrase because apparently the creators of ZXUBN didn't think that was important to handle. Anyway, I digress a password slide here based on how many elements usually words but sometimes a common password ZXUBN believes it is made up of and the rarity of the rarest element based on the number of guesses required which should roughly dictate the size of the source dictionary required and you can see there's a rather precipitous drop-off somewhere between 32,000 and 64,000 along with some producing preference for three element passphrases which might have had something to do with certain recommendations packaged with a policy. Now this is neat and all but what about the other datasets say not from this company? Well, I considered calling up various companies with 15 character policies and seeing if they were willing to give me their hashes but it didn't work and it seemed pretty unlikely so I had to artificially make one. So this is the result of over 37 million hashes from the Have I Been Pwn version 2 collection where every password known to be less than 15 characters long has been stripped out which is really only possible because over 90% of this list has already been cracked. Both of the attacks here are pretty shallow and short running in large part because I only needed to prove a point and also the hashrite is artificially terrible when cross-checking 37 million hashes for matches that being said this is 1.14% of all the passphrases in the dataset in less than three minutes. Scared yet? Good. Because this is an example of what happens when you use a lot more candidates and checking combinations of passwords. If you still remember that third strawman argument this is all of the rock you word list on top of itself. Sure it takes a day and 16 hours on some piddly GPU horrifically bogged down by 37 million hashes to crack. Normally in cloud infrastructure this would take 16 minutes for a single machine in ABUS or Google cloud with those 8 GPUs and would cost single digit dollars. Now that I've got the would be crackers inside and there are some technical shortfalls with hash cat that also need to be noted. The combination mode only takes two files input. You want to use this so hash got to load as much as it can on the GPU RAM ahead of time which means if we want to check three or four element passphrases you need intermediate dictionaries being mindful of the exponential file size and it's a similar deal for the hybrid mode which can only take one word list and one mask file. You could use other programs and piping candidates but you pay the piper when doing so on a fast hash. And the other thing hash cat isn't going to do for you is generate variants of passphrase candidates with all the spaces in capitalization that people generally use. From what I've seen you're going to want title case sentence and all lowercase both with and without spaces. Now I do not consider spaces a special character anymore. The easiest way to do this is with individual rules on a commonator attack starting with a pre-processed dictionary that already has the spaces between capitalized words. Then you can manipulate easily enough to get the spaces in capitalization desire. For hash cat j corresponds to the left word list k corresponds to the right list. Which on the topic of rules is another shortfall. You can only do one set of rules per word list in combination mode. Yes sorry if you want to check lots of different things using rules like common words in between two words you're going to need a honking massive and ugly bash file or program to manage before you. This is actually what trusted sex hate crack is doing behind the scenes for what they call middle or thorough combinator attacks. It's basically spawning off tons of hash cat command lines. All right enough about conditions of combinators. Let's move on to prince attacks which despite the other name for the attack has nothing to do with the artist formerly known as prince is a acronym for probability infinite chain elements but it is pretty per pretty simple in purpose. Take the path of least resistance and find more press raises sooner by outputting shorter link candidates first. Not only is the search space smaller but we can also expect to find more pass raises here because people tend to pick pass raises close to the minimum required effort. This overlap of these resistance motivations is just great for an attacker. How does prince work? Well it ingests a word list sorts all the contents by lengths and puts them into separate lists and then begins producing candidates at the minimum specified length out of the lists of the various lengths iterating through all possible combinations of lists until the options are exhausted at which point it moves on to making longer candidates. Now after that explanation of how prince works you might be wondering um why not just use combinators of length cut lists aren't prince and hash cat separate programs want to have to pay the piper didn't you just say that transferring all that stuff for PCIe was terrible for performance? Well yes you do have to pay the piper and you could manage all of these attacks manually but it gets pretty crazy to manage all the lists and combinations for three and four word elements it's doable but you really need another programmer script and there's a solution that circumvents this and it comes back to one of the undermining factors. This is this is the positions of dictionary elements in passphrases larger numbers here mean more passphrases contained in elements straight out of a dictionary in that position and there is a one heck of a disparity between the last position in a passphrase and the rest of a passphrase because humans pack complexity on the ends mostly the end sometimes in the beginning extremely rarely in the middle so the fix for prince's terrible hash rate on fast hashes when it's piped in rules because you can use the default hash cat cracking mode you can specify with the rules file any problems well yeah most rule lists are made by people on the internet and aren't long enough to make use of GPU spare resources or are more focused on modifying passwords and not alternate are not altering passphrases like adding suffixes or prefixes so I've made them one more focused on common prefixes and suffixes you know your 20 20 your acclimation point et cetera with 400 4175 rules which is up in the rephraser repo I'll mention later and soon to come I'll also be releasing a prefix and suffix list which probably is already up by now based on the frequency data of those cracked password phrases mentioned earlier and that have I've been pwned version two lists okay let's talk success rates success rates for print now the kicker here is that we're making use of that rule list I mentioned to catch passphrases with common suffixes and prefixes there are no single element passphrases here we can see a repeat of an earlier trend namely a sudden drop off between 32,000 and 64,000 with some clustering around three elements including the suffix or prefix you gotta remember all of these are being run with the rules so this works out pretty well for sussing up passphrases at least with intercontinental exchange what about how much larger it is set well we can also see the reason why people like using prince attacks both of these attacks were using the rock you password dictionary as the source and both were stopped at the same time at one eighth the runtime of the rock you squared combinator rock you on top of itself that complete attack I mentioned earlier took one day 16 hours on our piddly setup and got 5.82 percent these attacked managed to concentrate a significant portion of that success of very early in cracking the other comparison here is the performance between restricting the number of elements princes allowed to use in that and the attack at least in this case there was a limited but significant advantage to focusing only on three word element passphrases because again we're also using that real set mentioned earlier with the prefixes and suffixes based on the prince candidates I'll also say again this was that a really abysmal hash rate we barely were reaching one giga hash when doing this and these attacks can be much much faster when you're not drowning your gpu and 37 million hashes to check and speaking of past it's about time I get to the new and shiny word level markup chains for going after passwords that are at their core phrases so what are word level markup chains okay well the simplest explanation I can give is it's an over glorified state model to store relationships between groups of words for this example we'll just do a two gram model think two elements in the state if the phrase used for training is the cows eat grass a two gram model will record state transitions for the cows to eat and cows eat to grass obviously this isn't a terribly useful model by itself but imagine if it had some more training data say hundreds thousands or millions of sentences the intent is to build up a model of the relationships between groups of words and if we want to be more accurate we can also increase the number of elements in the state or raise the number of gram of the model to three or beyond though research generally seems to show that three is probably the point where you should stop uh now there is one pre-existing proof of confidence I could find that does more or less this and actually would generate output fast enough to be useful in password cracking sadly it was single-threaded written in a mix of C sharp and C plus plus and designed with a Windows compile targeted mind that simply went due to my purposes so I created something similar in Python that is parallel uses existing libraries and is naturally more multi-platform so how well does it do well the astute members of the audience when I showed this table on the right of every pass passphrase crack that was combinator findable may have noticed some numbers that were a little too far into the lower right based on the computational complexity charts we went to before let me color that in for you now the various student view might also infer that I did not in fact have unlimited access to eight view 100s and also that chart on the right is strangely positioned almost if it extends further to the right well it does and I had a single 2080 time for my research I was also heavily time-boxed in everything on here that isn't a shade of green not from combinator attacks flat out admittedly it's not a deluge of results but come on years literal years for the combinator attack when not I've yielded some of these and the couple out in the six and seven element column do you have any idea how long 10 to the 13 is because that's how what it is corresponding on the difficulty computation it's millennia certainly I'm very happy but what about our much larger data set well a similar story these attacks are very quick when doing only a couple of words it but admittedly don't pull in too much only tens of thousands you know the other fact you are likely to play here is there's just not that many past phrases that are also phrases that exist in the data set with four words while we're here I would like to point out the accuracy trade off of choosing a 3 gram over 2 gram model you can see that the 3 gram got a little over half as much but it only took one 12th of the time that is the accuracy trade off now before people get too uninterested I should show off the really impressive thing for me which is what happens when you put these to use not even longer past phrases there still shouldn't be much out here in the search space but people seem content to use phrases at this length possibly because these should be safe according to everybody's advice and these are even common phrases they're just kind of normal phrases here's some examples of what isn't safe anymore these probably aren't the past phrases you expected to leave if you're knowing or crackable and certainly not crackable in one, two, or nine hours at speeds barely over one giga hash a second you have nightmares yet? because everything here was generated by a Markov model in a simple hash gap rule set no dump passwords as input no cherry picking shenanigans and the training data was completely agnostic to the target past phrases no targeting of any kind admittedly it does produce some strange candidates sometimes then I've collected some of the weirder ones I've seen on the right here I don't know how he went from intrauterine pressure catheters to kiwi or kiwis I can only assume one of the websites scraped for the training data was a blog post about making kiwi or kiwis with an embedded catheter cowboy advertisement I really don't know now I should head off some common questions I'm anticipating some of you familiar with hash cap might be thinking doesn't hash cap already have markov chains why can't we just do this well it does it has a premade model for character level predictions which when you start going out to 15 character passwords and what you want is multiple actual words that model falls apart pretty quickly not to mention it doesn't store relationships between words and it will basically never finish because the search space is at least 333 sextillion combinations if you're thinking haven't you tried this with gpt2 or some other machine learning the other way my answer is have you considered hash rates markov models are good for this because they're quick especially compared to massive models like gpt2 or some other ml model which by itself is more than 5 gigabytes and tends to take multiple seconds to pop out a single prediction we need millions of predictions for seconds in a second for it to be viable and password cracking the other side of this is the more advanced models focus on making paragraphs or complete pages we barely need sentences and there are very few passwords I've seen that resemble more than a sentence fragment all right head back out of the weeds here it's time to wrap up and summarize if you're not an attacker you've probably been sitting through this talk mildly terrified waiting for some kind of recommendation allow you allow me to make you wait slightly longer by dressing the red teamers first red teams please don't cop out 15 character passwords are hard but they aren't that hard to crack even against bcrypt you should at least be doing combinators prints attacks and markov chains for the duration of your engagement and don't pretend your results should approach zero it should never be zero there will always be somebody who picks something like hello spring 2020 as their passphrase blue team your turn finally do the math that applies to your situation and do a little more than just gather people that scored highland dsat more than a decade ago and pack them into a conference room and wipe with it if interceptor dumped hashes or personal password reuse is a concern for you and it almost though it should be feel free to take my math and represent it unless your organization is cracking its own passwords you will not know what people are using recommend the best behaviors plan for the absolute worst which brings me to something that can serve as a starting point for what to pass along to end users and this is my personal recommendation which of course is fairly paranoid but the intent here is to make sure a hash regardless of what type of hash it is can't be cracked in $500 of expense which means you're going to be making a logical or highly unlikely phrases of five or more relatively uncommon words that are thematically unrelated that seems like a lot but we're talking about diabetic unicorn trap Chicago title as a passphrase it takes a little bit to make but these things should be able to hang around for a while and unless you force an attacker to use a large dictionary combining multiple types of words cities structures fantasy creatures verbs medical conditions and use some words that are uncommon you're going to be building your security posture with cracks the foundation there will be some edge case that will allow them to crack your passwords so here's a very over full slide of everything any questions I am in the discord channel waiting by this time rephraser is up there at the top though it's probably just easier to look for github trafcode rephraser and in google and yeah feedback is appreciated