 So real quick about me. I have a bit of a reputation as kind of approaching this as an academic frame set simply because I got started in the whole password cracking thing Through my research when I was getting my PhD But I really do strongly believe in learning by doing so I'm an active member in team John Ripper and I do participate in password cracking contests like you know crack me if you can That's going on right now Luckily this talk is you know being filmed, you know before the contest starts so no spoilers here unfortunately But good luck to everyone else who's participating so password cracking really it's my hobby and A little bit of an obsession, but it's not my day job unfortunately but my day job has been very exciting recently though because I really focus on medical device security and you can imagine with all the Greatness around COVID-19 that has been a And then it's been an interesting time So one project that I really kind of want to highlight is the open ventilator monitoring and alerting project That I've been helping to contribute to and there's actually a talk At the biohacking village this Sunday and that I really highly recommend people go ahead and listen to So that our team members are giving it And it really is going to talk to you about you know how the lessons learned and how other people can help contribute as well Because this has been a big problem because as you I'm sure you're aware There's been a huge demand for ventilators to be able to help deal with COVID-19 So there's been a lot of different projects that have kind of stood up to try to help produce, you know low-cost ventilators to help fill that need Pretty quickly there So rather than have every single Do it yourself ventilator develop their kind of whole own monitoring alerting framework We're trying to produce one common one that can be applied to all these different projects across the board so because When you have these ventilators being able to a treating, you know patients the patients are highly effective So you don't want to have the nurses exposed to that But if something goes wrong, you need you know seconds count So you need to be able to forward all that Sensing information that these ventilators is doing back to a centralized nursing workstation And you need to do that securely because you're running on this on a real hospital network so that's been a really Fulfilling a project that I've been working on So nursing that I'm kind of helping out with here as I move my head is I'm helping out run the defconn biohacking villages Captured a flag contest So this was originally supposed to be in Vegas that changed course So now a lot of big equipment is actually sitting in my house So I have to be able to provide a way for hackers from all over the world to be able to log in and hack These infusion pumps here without also hacking my smart thermostat It's part of that I actually had to repurpose one of my password cracking rigs as you can see there in order to run All the the VMs that are helping to keep people, you know on those ventilators and hacking those and not you know hacking my smart thermostat so One probably the first questions that should start kind of addressing here is you know, what does that PCFG stand for in? You know PCFG password cracking so Originally and it I guess still technically it stands for probabilistic context-free grammar Which is the kind of the modeling framework it uses or in a model of how people create passwords So if you're into you know, the serial autonomous or you know formal languages, this might actually mean something to you But for most people, you know, they hear that and they're like, oh god, that's like mass and stuff like that I mean, there's no way it's gonna run on my computer And then he's kind of like slowly walk away So I decided in order I need it need to have a more descriptive name So I went ahead and rebranded it the pretty cool fuzzy guesser So and this kind of explains it a little bit better about what it's actually doing underneath the hood because you train this on a list of passwords And then they'll go ahead and create guesses that are similar to those passwords, but different Which is really kind of important in order to help you know expand your cracking session So I don't this is my favorite slide. I've ever made so it's all downhill from here So really kind of what it's doing is it's using machine learning in order to crack passwords And when I say machine learning, I mean that in the traditional sense of a whole bunch of if-then statements So it's not using neural networks or artificial intelligence But you are training on passwords that you expect to be somewhat similar to the target passwords that you're cracking And after when it processes that training password set it extracts all sorts of probability information about the The components of that those passwords that it finds there So it figures out things like you know capitalization masks whether numbers go at the beginning of the password versus the end the probability of individual letters and numbers Found in that password keyboard walks and so on and so it goes ahead and creates a model based upon all those different types of Probability information there and then it uses those in order to generate a very highly probable password guesses in probability order So they'll start with the most probable password guests and go to the second most probable password guests And then go to the third one and so on until you crack the password that you're trying to find or or you give up So let me just move my single here a little bit here So just to kind of tie us back into probably what's going on right now as I said, I don't know What the actual contest is going to be like for to crack me if you can but core logic Hopefully provided a brief summary of you know, what does the center is going to be at least here? And so we're going to be targeting 12 different individuals and those individuals change your passwords over time in order to be able to deal with more complex password creation requirements and That sounds a little bit something like something that you know PCFG might actually be useful for so I'm really optimistic for this contest You'll see how you know optimistic. I am on Sunday Saturday when I'm actually, you know, we're giving this talk but You know, this is kind of the scenario that the this was originally developed for in the first place of You know, you know how a subject creates passwords So you want great passwords kind of similar to that But you also want to go ahead and change them and maybe you for example, you use more complex rules or you know complex password creation work requirements added on top of that there So I'm available probably on discord right now And so I'll be able to answer questions about how potentially you might be able to tweak this in order to help in a scenario like this here The fact that there's a lot of academic papers about this though is when I give a talk like this I don't actually have to create any of my own graphs I just can go to other papers there, you know look at the research that other people have done and just pull out their graphs in order to Be able to talk with it here So one thing that I really kind of want to highlight though and you need to kind of look at this with a bit of a skeptical Eye is that you'll notice that all these cracking sessions are really, you know short I know, you know, one, you know trillion guests is here that might sound like a lot But we start talking about you know GPU password cracking you're talking about like under a second in order to generate all those so that's a yes, you know, no time whatsoever and Part of reason for this here is that the the PCFG approach. It's very slow. It doesn't you know scale very well was multi-threading currently so When you start talking about the passwords that you want to be able to crack It works very well when you're target going ahead targeting very slow password hashes where you can only make you know You know thousands of guesses a second because the hash is very slow But we start talking about things like, you know unsalted md5 Other attacks that are going to be much more effective because you can just make so many more guesses in the same time frame there So when you start talking looking at, you know Faster password hashes you can certainly go ahead and still use a PCFG to supplement your attack And you can still go ahead and crack some passwords that you might not normally get But in general for the faster password hashes you really are going to want to go ahead and use more traditional types of Passive cracking attacks in order to really make use of the hardware that you have available to you So I want to talk about this graph kind of Though I really focus on it because this was a really neat study done by Carnegie Mellon University and One of the problems when you look at the academic research, especially when you start talking about offensive tactics is that The academics are you know running attacks themselves. So you're looking at the How effective students are at cracking passwords versus someone professional potentially So CMU, you know took the probably the The most straightforward approach to be able to solve that problem was they went out and reached out to core logic You might have heard of them. They're running this, you know password village. They run to crack me if you can competition So when you're trying to find an expert, you know, they're like, you know way up there So they're a pretty good representation for that there So what they did was they gave one of the core logic engineers a password list They asked them to crack it and they recorded, you know, how many passwords they cracked Over time it was the number of guesses they made and then they compared it against other cracking sessions as well And you know one thing I'm really, you know, it makes me smile every time I see this here is that the PCFG Did really well compared to the pros which was core logic For that short cracking session So when you start asking like, you know, can this represent? You know how a real professional password cracker Operates the short answer is it certainly you know appears to be able to be able to do that there So, you know full disclaimer When you give, you know core logic had more time. They definitely perform way better. This is a logarithmic graph so that's about a hundred times more guesses and also I'll admit this wasn't fair to core logic either because you know, that's not typically how You know people crack passwords in real life there was that such a short, you know cracking session And when it is that short usually it's against really Strong password hash and you have a lot of time in order to really mainly tweak your attack that you're running there That being said if any of you are listening, I would love to have a repeat or a rematch of this You know attack just to see, you know, how this performs with all the new improvements That have been made into PCFGs and I'm sure that you know core logic has really been upping their game over the years as well So that's why I'm somewhat hopeful that we'll be able to find it, you know useful in the contest that's going on right now So enough about, you know all the research side of that there Let's talk about how to actually make use of this PCFG password cracker So the first thing is you just go ahead and download it from to get love get hub repo And the requirements of it I really have strived to make it as simple as possible So you need to have place on three and that's it So there's an optional care that the place on module that can help during the training And that's because it helps detect what the character encoding the training set is because character encoding is a bane of my Passive cracking existence, but even that's optional and actually it's now being installed as part of pip 3 so If you have pip you probably don't need to install it yourself as well And this is really useful though because I find a lot of situations where like when I'm cracking passwords like I don't have like internet access So it's really nice to be able to go ahead and quickly just throw my tool on a box and get it to run So if you can run place on three on the box, you can probably run this here So I've tried it on a bunch of different OSes. I've actually even got it to run on net BSD And it was just pretty much the only thing I've ever gotten a be able to run that BSD So hopefully this is easier than your typical academics tool set in order to get it installed and start cracking passwords as quickly as you can So we start talking about hardware requirements because that's always a you know important portion and we start talking about the password cracking The PCFG tool set it is single threaded CPU bound, which is why it's so awfully slow But it will use an entire CPU thread. So you do really need to dedicate one full CPU thread to the PCFG The other thing is it has very high RAM usage It basically maintains a lot of different data structures and memory and those data structures become more complex over time So it just grows So I could have done some things try to go ahead and prune that or you know move some of it to disk But RAM is cheap. So I haven't so it'll just keep on growing over time. So initially it starts up rates Low usage, but if you're talking about running this, you know, a password cracking session for like a week or two You really need to have at least 16 gigabits of RAM. They really kind of just fully dedicate to the PCFG tool set itself so The next step is to actually make use of it and run it So I apologize up front that I tend to use the words rule set and grammar error changeably And at least to me they mean the exact same thing But really what I'm talking about is that the you know, I mentioned, you know, machine learning a couple of times here You have to go ahead and train a grammar on the existing password data set Now you may want to have, you know, you know, different Grammars for different targets that you're trying to target. So if you're trying to target, you know, a Web application caring to younger people, you might want to train it on passwords that resemble that if you're trying to target You know corporate passwords, you might want to train on corporate passwords instead And use those rule sets against target passwords that you're specifically you think will match that So you can have as many rule sets as you want to be able to really kind of fine-tune your cracking session there So the default one rule set that comes with the PCFT password cracker was actually trained on subset of 1 million passwords from the rocky data set which came out in 2008 and was against web passwords So there wasn't really any strong password requirement whatsoever. I've been thinking about updating that. So if you have a good Dataset that you think that I should use for that there I'm open the hearing about that there to make it a little bit more effective But that being said the rocky data set is still extremely effective even to this day It's just you know, blink 1a2 is not nearly as popular so After you have to use the data set you want to do it use though Now you go ahead and start generating guesses. So it's a Python program. So you just go run the Python 3 You run the PCFG guess or that pie tool or you know from the repo You give the name to rule set by defaults default. So if you don't go ahead and Specified that there. It'll go ahead and use rocky data set and then you go ahead and specify session name as well By default. This is default as well And so the session is used to restart a password cracking session So if you have to cancel it for whatever reason you can go ahead and restart it back up again so I really want to kind of highlight though that the PCFG toolset is only a password guest generator toolset. It will generate password guesses It will generate those password guesses in probability order So start with the most probable password guests second most probable password guests and keep on going down the line It will not actually hash and crack any passwords. So you need to use another password cracking toolset for that there You know both John Ripper and hashtag work are basically any other password cracking tool that accepts, you know Guesses in from you know the standard input there As I mentioned earlier, I'm on team John Ripper So I'm going to go ahead and use John Ripper for pretty much all of my examples here, but you can totally use hash cat as well So in order to do this here, you run that you know the previous command that I talked about And then you run you pipe it into for example John and on John They have a you know a command called standard in so that you type that in there and instead of running data from like a word list Or generate your password guesses it'll go ahead and use the the password guesses that are piped into it instead And you're cracking passwords. That's really all there is to it So there's definitely optimizations for actually using this in the real world though So the first thing I really kind of want to highlight is a lot of times you want to know what the status of a cracking session is So the challenge when you are using the pipe command though Is if you go ahead and hit the enter button on your keyboard instead of sending the enter button to John the Ripper It's going to go ahead and forward that to my tool instead So you might want to be able to you know get John Ripper to output a status report So the way that you do that is you send a sig user one signal to John Ripper If you're writing this on a Linux system here You just type in kill dash sig user one and then the process on if I or John Ripper and when you do that You hit enter it'll be it's like hitting enter on the John Ripper itself And it'll go ahead and output the status output of its current cracking session So now you can do things like okay not only see the password they're getting cracked But you can see like the number of hashes the total number of hashes are cracked so far You can see like for example the guessing speed. So in this case, it's making about four million guesses a second And then you can see like how long it's been running and you know all the other you know options as well So I want to kind of dig into that one, you know output of that cracking session though because I think you know This really kind of helps demonstrate Kind of some of the power of using the PCFG Because normally when you're just here show is showing the passwords as they get cracked So you can kind of see that it's not just going ahead and you know Figuring out one rule and then exhausting that rule and you go into the next rule Like you would see in the more traditional password cracking session instead It's creating much more fine-grained rules and iterating between all those depending what the current probability of it is here So we see these passwords being cracked. It's kind of fun to try to figure out like how did the done in line? System, you know generate that password guess. Why is it making the guess right now that that it is? So kind of we look at you know initially here like this is pretty easy. Okay. It's just taking like some five-letter word I apologize. My microphone just died here. So funds are doing now. You know defcon remote So, you know it's using five digits, you know, five-letter words plus four digits here Moving on though, this is a kind of an interesting one here is ss is cool So I looked into my input dictionary and ss is cool was not in my input dictionary or my train said at all And I found out that it was actually you doing multi words for this here So it was combining, you know ss and then it's cool So one kind of cool thing about this and I'll talk to us about this a little bit later is That instead of you know going ahead and breaking this up into three words like we normally would think about it there It actually broke it up into two different words Ss is cool. So that way you can go ahead and go through it and say like okay is you know Katie cool is you know, Ali cool is you know, Bob cool because there's a lot of cool people out there So it can go ahead and iterate through those there and try that type of you know Mangling rule for it there and what's really cool about this is that that it learned that is cool Is a common word? From the trains itself so I didn't actually ever program in that logic into it it learned it by itself By looking at the training data, which is as I said pretty cool But you can see after that there it kind of went into a brute force It wasn't the pure brute force and I'll talk about the different types of brute force here It's actually combined. It's very short words kind of like a combinator attack But still, you know, it's able to kind of get that out that way And then it went ahead and tried you know words with you know special characters the same special characters the beginning the end of them too And that's you might be able to see that in a traditional passive cracking session But you actually have to have a rule in order to generate that and try to create those rules is a real pain So you won't see those and you know most you know common publicly available rule sets But it was able to learn that from the training data, which is I thought pretty cool as well so Down here and I'm kind of you know need to get off the screen here But you can see it's trying some longer words, but these are actually while they're normal words here They actually generate them via the Multi-words as well. So like finger plus nail or 90 plus nine and once again This is kind of really useful because now you don't have to have things like 99 98 93 You know in your data set your word list as well because it's generating those on the fly Kind of went down a little bit further here. This is your more traditional kind of rule here It's just two digits plus capitalizing the first level word So you can see it's starting to do that But you know settings is a pretty uncommon password word to be able to use so it's trying it later in a cracking session here And now it's even combined even more manly rules So it's trying doing a multi-word, you know of wood plus fish and you know Tara plus Don and adding digits to the end of that as well So you can see how it starts stacking these different rules together And I kind of want to highlight that this cracking sessions been going on as you can kind of see from the Stats output for about 13 minutes So all the really easy passwords have already been cracked. It's already guessed, you know One two three four five six and password one two three four five and so on So these are starting to get into more of the you know the fuzzier of the the rule set So you might not normally see in a normal crap password cracking session so as I Mentioned a little bit earlier if you hit the enter button it's going to go to my program not you know John the Ripper But there's a lot of information that I want to be able to provide to people about the stats of your cracking session So whenever you hit enter or any basically any zarkis here It'll display an output of what it's currently doing so you can try to figure out You know whether you want to continue it whether it's working correctly and whether it's kind of doing what you want to do as well so Kind of going through this here I hit enter twice and so you can see how it's generating you know these password guesses as doing it here So the first one, you know, it's a you know basically going ahead and trying to combine two words So it's a multi word type of attack And you can if you dig into like the real details of it there You can kind of see that it's trying like the hundred and forty third most probable word with no Capitalization and it's combined to the ninety third most probable four letter word with no capitalization as well So you can see that the the probabilities that assigns to like even individual words and so that is very very fine grain So it's going to try some words And then like do other mangling rules and stuff like that and they'll go back to the less probable rule words Later on in the cracking session So now this next one here. It's kind of a little bit. I'll try to get away or something like that You can see that it switched to a real brute force attack using omen or ordered more cough ordered Markov enumerators And I'll talk about that a little bit later there But really I kind of want to highlight though that it's trying you know more traditional cracking rules So it's like you know like you know Combining words and then switching to brute force and they'll switch to another Mangling rule after this here, and then it'll just keep on going to the base point whatever the current probability is Now as I said, I really Struggle with documenting my code So I try to go ahead and add as much documentation into the runtime behavior of it as possible So if you instead of hitting you know enter or anything else along those lines you hit H and hit enter It'll provide or just H actually It'll provide a SAS report output of what all these different fields here mean and that SAS report actually is much longer Then even you know displayed on the screen here But explains what all those different like letters like a five or C five actually stand for The one thing that I kind of what really want to highlight though is this one metric here called probability coverage because Since the PCFG a password cracker creates guesses in probability order it starts with the very high probable passwords and it goes to less problem passwords and less problem passwords And the model that it has will basically never finish It'll just keep on figuring new combinations of words to go through to it So a real challenge becomes you know, when do you go ahead and give up on a cracking session? So I you haven't cracked the password When should you go ahead and you know kill this off and try some other cracking type of attack that might be more successful Or when do you go ahead and just choose to say I'm not gonna crack this password and move to a different case So there's probably coverage is a very fuzzy metric that I tried to develop to try to Just give you a little bit of a rule some about when that should be so what this metric says is that if the target password is The same probability distribution as the password They said that I trained and if my grammar and how the model how these password passwords created was exactly correct This is the probability that we cracked this password now These are one of those assumptions is actually true in real life, you know the probability model the password trying to crack is probably very different You know the the grammar that I generate and train on is absolutely not perfect But at least as I said it kind of gives you a rule of thumb to say okay You know this is trying to get a little bit high it says, you know, I had like a 90% chance of cracking this password I haven't cracked this password yet Maybe I should go ahead and give up so and you'll notice this number jumps up really high initially because it's making you know High probability of getting password guesses and then it slows to a crawl to almost like no advancing after you get through You know like 70 or 80 90 percent, you know completion there So this is a kind of really good to be able to figure out, you know Where can I go ahead and devote that you know that one single CPU in that ram somewhere else there? So another usage tip I just kind of like highlight is that sometimes the the cracking dynamic when it comes to speed is completely reversed So you might be trying to crack very very computationally password hashes expensive password hashes or a lot of like let's say assaulted hashes In which case you're really only making you know a couple guesses a second Well, this generator is generating, you know, let's say, you know between like, you know 100,000 and like 4 million guests of the second so it gets backlogged And basically essentially freezes while it waits for to be able to send more guesses to the passive cracking program So occasionally if you hit enter it won't actually display set the status or it'll take a while to display the status And that's kind of usually what's happening So if that's happening and you're kind of curious whether the passive cracking sessions crashed or not I recommend going back to earlier advice about sending a signal to let's say John Ripper and it's seen how that's doing there In order to make sure that your passive cracking session still running so As I talked about, you know multi word feature is probably been the biggest, you know addition to the new 4.0 rewrite and it has completely shocked me how Effective this has been here So I won't get talked too much details about it But the one thing I want to get really kind of a stress though is that it is not language specific at all It learns all what constitutes a word from the the training set that you're giving it there So they'll pick up things like new band names or proper nouns that are really hard to specify inside a language dictionary Whatever new Pokemon just came out And identifies patterns like you know, I love and stuff like that So this is very useful for being able to you know You know try to target new you know Password hashes So it is I said it's not language specific it works best with I would say kind of like a European English type languages It really struggles still who is some of the other languages like Mandarin But that is something absolutely that I really want to focus on more going forward here It's not perfect. It's definitely a work in progress So there is a balance between you know creating, you know false positives of the matches here If you don't see some of the base words in the the train set by themselves, they won't identify them but it's something that is evolving and part of the new pull requests From the I just received from somebody else actually has some improvements to this here that I'm really excited about getting pushed into Maine So one of the other big features that have been added recently here is ordered Markov numerator so And the old reason why I talked about this is that a similar can be approached can be taken for pretty much anything for so someone that creates a better Cracking attack or cracking mode it can totally be incorporated into a PCFG. It's a style attack I'll be a little bit like the Morgan that response or respect But the real challenge is to be able to figure out how to assign a probability of password guests So if you can assign a probability of password guests, I can probably incorporate it into a PCFG So just kind of in the last little bit here. I really kind of want to highlight Some you know additional tricks that are very useful when it comes to cracking passwords So the first one here is this skip root flag in the PCFG and basically what this does is disable open guest generation and That's not to say that you know open just right guest generation is something that's bad to do It certainly definitely helps increase the success of a passive cracking session But you can this is the way to paralyze your attack So if you're having another system that's going ahead and really cranking through Your brute force attack you might want to go ahead and you know do all your brute force on that other system or on the other thread And then run the PCFG guests are really just to focus on the word mangling rules instead So in order to do that all you do is just when you run it just type in skip root Another flag that's really kind of useful is the all lower flag and what this means is it'll stop doing any sort of case Mangling on the password guesses So Let me try to move my picture just a little bit here just to make it easier to read As I go back I apologize Okay, so a Lot of times see you may want to not go ahead and do case mangling inside of PCFG itself and One reason might be that the hash that you're targeting is in case insensitive like landman That's not probably the best example though because if you're cracking landman hashes you're not using PCFG in order to do that There you just go ahead and brute force in that sucker and taking it out that way where it's more likely though is that Case mangling is very distinct for how people do it there So if Someone Kate does a certain type of case mangling they have a tendency to keep on using that strategy for all their other Passwords so when you start doing things like target of password cracking You may not want to go ahead and just go ahead do you know what everyone does you want to really make a really specific case Mangling for that particular individual in that case What the better way to do this is that John Ripper supports a really powerful features of called pipe so What the pipe does is instead of just go ahead taking the guesses in from standard input and Writing them you know as is you can apply additional rules on top of that like you would do in a traditional You know password cracking dictionary type of attack so you can specify your very specific You know case mangling rules inside John Ripper's rule set and then pipe You know the lowercase rule of password guess is writing a John Ripper and have John Ripper capitalized it itself And that can be very powerful when you're you have an idea what type of case mangling you want to be able to target so And I of course moved it to the the wrong portion here Let me move my screen again here. I apologize Some coming improvements as I mentioned there was an amazing pull request that was Submitted to me with a bunch of new features I'm slowly incorporating them into the core But I actually have the features available as their own cat tool called segmenter to pie and by that I mean And I apologize if I missed pronouncing name because I've only seen it written but the tune one Wang submitted this here and it Is really Impresses me there. So probably the biggest feature. I'm really excited about is leap speak replacement This has been a feature that has been kind of my white whale as far as implementing and it's just Erasing time of gone through it. It's just not been very effective but that's Currently incorporated into this tool he has called a segmenter pie. That's been called included in a repo That will go ahead and try to parse that information out So I'm looking at getting that incorporated in my core trainer and getting incorporated in the password crack concessions in order to Be able to really target that there He also improved some of the multi-word detection. So he made that better And then he also has incorporated some new approaches into the password score Which is a different tool that you can go ahead and submit your password into The password score and they'll tell you what the probability of your password is which is kind of nice as well So all credit goes to him for days. I really impressive Precious is here And if anyone else is looking that you know helping out too, I'm all about that. So thank you very much once again for that there Okay, so let me move my screen around again here okay, so Next thing I can only want to talk about here is the compile PCFG guesser as I've been talking about the place on tools I had all all around along right now So the compile PCFG is PCFG guesser is a completely different, you know fork and The as kind of get the name there instead of being written place on it's written in compiled C code It's a little bit harder to get actually installed and running simply because when you start talking about compiling your code You know it runs a grand mic machine, but it has challenges elsewhere I tried to go ahead and use the hash cat the build make file for this So if you can build hash cat on your computer, you have at least a better chance of being able to go ahead and get this running as well But if you have problems, please, you know reach out to me on the GitHub site and I can try to help you fix those there So I will say that the trainer portion it will always be in written in Python I just like writing in Python too much to change that over So basically you'll go ahead and create the train rule sets with the Python trainer But then copy them over to be used in the compiled version here Also, the compiled version has a tendency to see the lag in features from the Python toolset because once again I like writing Python I Not the best decoder in the world So basically if I write a hello, hello world program is going to have like five buffer or flows and you know a segfault so takes with that is what you will there but Making this available as someone wants to write a better one. I'm truly open to that as well But you know it doesn't have saver store. It doesn't have that's outputs and has no open guest generation So all that being said, you know why Bowser was is here and really at the end of the day The main reason is it's about 20 times faster than the Python toolset and I I've always heard that You know C code is faster than the Python but when I thought I was like holy crap So I will be up front. I'm actually even with all these limitations when I'm cracking passwords I'm using the compiled C version now much much more often than I'm using the Python wonder So because that 20 speed improvement is a hard to beat for most password cracking sessions So now I'm going to talk real quick about training Passwords So I've been talking about this a lot here and there's a lot of different reasons Why do you want to go ahead and create a new password training set there? So language is a huge one So you want to be able to train on passwords that are similar to the target that you're trying to target And another big one is that corporate passwords are very very different than you'll see from websites And I'm sure you probably heard core logic talk about this before You know yesterday But that's something that you know, it's very evident So if you're trying to target corporate corporate passwords, you probably do want to go ahead and train on corporate passwords Versus going ahead and training on passwords for some gaming website So another reason to go ahead and train it though is if you're targeting a specific password creation policy Or you know which mangle rules your target prefers So one way to be able to really target that there is to train only on passwords that match that training set there And there's other things you can do like the password of the rules or the grammar that I generate There's I made sure that I didn't include anything like a CRC check or any of those handy checks into it there So you can actually open up the files themselves They're just text files and start editing the probabilities of different things in them by hand too So if you say like oh, this is one word I really want to go ahead and make it like highly probable But I don't want to go have to train on a whole new Train set you can just go ahead and open it up put that word in there Give it whatever probability you want and that that will just be read in and now used in your password cracking session there So the other reason that train on the password train set there is it generates a bunch of information extracts line information from that That password set so it's really useful to be able to analyze a new dump that you have accessible to you there So for example, it'll pull out like common emails. It'll pull out dates and websites and try to help you figure out Where did this you know password data set come from there? so The next question of course is you know, where do you go ahead and you know get these password data sets from? So there's a lot of challenges with this too because a lot of the assets are Not optimal when it comes to training on so I don't know if you know of hashes Org, but it's a really great site for being able to download all these you know dumps as they come out here So for example, let's say you want to go ahead and train on this data set here I'm not gonna try and pronounce the name of that site Because I'm sure I'll just horribly horribly mangle it there, but when I did some googling about this site here It was a site for You know new college students trying to find a job in China, so that's kind of a And they're seeing data set there that you might want to be able to use in order train for crack for passwords here So if you download same from like hashes org the first most important thing it's like the plain option plain text option to train your rule set on because you don't want to include the hash Part of cranes up because they don't think it's part of the password and just it goes poorly there So when I was saying I really kind of want to highlight here This is a feature that I'm hopefully hopeful to be able to get you know added to the PCFG tool set But I was informed by the owner of the site here that They actually do some additional things for encoding non UTF-8 characters that so That my trainer will not fully parse correctly So that's something I need to add in to so that it goes ahead and You know uses the correct character encoding for non-English passwords So I just want to put that warning out to you for trying to train us on things like Mandarin But one problem with a lot of these dumps here is the first one is they don't contain duplicate guesses So duplicate guesses are really important when it comes to trying to figure out what the Probability of password is because if you don't have duplicates is one two three four five six looks like a very just random string so That's Useful, but I will say when you run longer cracking sessions of PCFGs That lack of duplicates becomes less and less important because you've already exhausted all the really pop problem password guesses The one issue though is that if the omen portion really does struggle without the duplicates So you might not want to go ahead and enable omen guessing if you train on a data set that doesn't contain any duplicate guesses there The other problem with these dumps is that they only contain the crasters have been cracked So basically you don't know or learn anything about the passwords that haven't been cracked That's not a deal-stopper, but I just is useful to keep that in mind there that you know The crack percentage is going to be very useful when it comes to figuring out how good a data set is or to create a new rule set So in order to train on a password data set there Really, you just apply some program once again You just give the name of the rule set that you want to be able to train it on as well as the the password data set that you want to train it on as well and I'll go ahead and you know run In or do all the the parsing and stemming of the password data set here So it will try to auto detect what that encoding is But when in doubt, you know set it to be utf-8 because the encoding really does matter quite a bit there So the first pass it takes for the data set there It learns all the character frequencies and base words for multi-word detection So it actually makes a couple different passes through the same data set in order to learn more and more and more about it there The second pass it goes through there It'll do much of the real parsing the password so it figures out things like you know keyboard walks alpha strings You know letters how probable like digits are and stuff like that So most of the stuff you think of traditionally when you talk about you know what their probabilities of different, you know Things are it does on the second run through And it actually goes ahead makes a whole nuts herd run through then to see about how Effective things like you know omen would be for cracking passwords So that kind of gets back to how omen generates the probabilities don't sign with it's different levels there And so this takes a while So if you're cracking out you're training it on a million passwords, you know, it's done It was in like a minute or two if you're training on a billion passwords. It takes significantly longer And it has to keep all this data in memory So if you're training on some of these really gigantic data sets there, it's just not going to work So one thing you might want to do is just like the subset of that password set, you know chosen randomly In order to train your data your your rules that on instead So after you're all done with that though it'll display statistics Testics about the data that you just trained upon to which are really kind of useful to figure out, you know where it came from So you know password lengths and stuff like that But the one thing that I've been kind of added that I found really useful is it'll display kind of like the top URLs Which are usually at the beginning the top of our like, you know, web, you know email Email account information, but if you start getting down a little bit You'll can actually see usually what the the website is because people have a tendency to use the website in their your password. I Also highlight the dates that it finds in there as well because that's useful Kind of trying to date when that password is that got leaked now I want to highlight that there's a long tail when it comes to the dates I'm sorry because people, you know create passwords before You know the the password is that gets stolen so you'll see a lot of passwords for years Before the data set actually gets down, but if you start kind of going down it a little bit You can say okay, that's probably about where to cut off was for when this password is that was you know disclosed So kind of one last thing I really want to talk about real quick is that I am trying to get this to work with other cracking modes there So one of the you know really popular pet cracking modes used is called Prince so Prince Basically takes a lot of different words and just combines them all together and makes lots of guesses based upon that But one challenge with Prince is that it's very dependent upon the input word list that you give it to it there Because it the word list needs to have you know high-quality words in it But it also needs to have a level of kind of cruft in there too Just because if you want to go ahead and let's say add the number one to end of a word you have one in your word list by itself And but the challenge is the larger your word list is the more words are trying to combine and then you know It starts to have issues there as well So we have all this probability information about how a password was generated So maybe we can go ahead and use this to create very bespoke word lists for like a Prince style guessing session there So I created another tool called a princely that basically just does that there So I'm sorry, my microphone just went out again there. But yeah, so it creates a very you know you know high-quality word list there and Doesn't automatically because one thing I like about Prince is it's the kind of the attack that I run when I want to goof off So like you know past cracking sometimes it takes a lot of brain cells because you're kind of looking at how you're cracking You're trying to optimize your cracking session and Prince is like I have no idea what I want to do I want to go watch Tiger King on Netflix. Let's just go ahead and just launch this off and come back and see if it Was successful and Prince is usually actually quite successful So it's a pretty good tool to BLU's and by anything that you can do in order to automate Prince even more all four Which is why I went ahead and created that So I'm gonna go ahead and stop the live stream here And hopefully I'll be on discord there in order to answer any questions that you have. I hope you enjoyed this I hope this was helpful and once again, you know, thank you for attending the password village here at Defcon safe mode