 Hello, CCC. I'm Tonimir, and I'll be talking about Unhash, which are methods for better password cracking. So the core of my talk is my research into how to automate boring stuff for checking passwords on online systems and improve password cracking attacks against offline passwords. The idea is to contribute and create elements that are reusable that can be simply integrated into other tools like Metasploit or any other tool with openness and reusability in mind. So let's start first with the psychology of passwords. What kind of passwords exist? First, there are some default passwords or default backdoors that somebody was smart enough to integrate into a system like root support admin and so on. They are real passwords because people need a system to remember a password. Each of you has a system how he creates a password, how he remembers the password, right? So people use combinations, mutations, they add words, they have their own ways and they're pseudo random passwords because you cannot say that a password is truly random in case, in some special cases maybe, but something like this, if somebody can memorize this one, that's awesome, but usually I can't. And that's why we have systems to remember passwords. So the talk is split into two problems. The first problem is the boring one. So we put the interesting thing in the back. The first problem is how to check systems for default passwords on network devices like routers and switches. How to check for embedded backdoors? Backdoors, not in the case of programmatic backdoors, but backdoor passwords that somebody left in the account. And the third one is how to obtain data from automated attacks. The boring problem here is default passwords on devices. Each of you has a router at home, some other type of device. Usually they come pre-configured for a default password. That problem usually is manifested in production via some kind of a forgotten device. In a situation, we need wireless right now, just leave a device and let's get it working. In testing systems, like somebody, there's a database, there's one username and password combination that is for the database admin, other one for the system admin, and the second one never gets changed, or with sloppy configurations. That's a boring problem. The mildly interesting problems are backdoor passwords and collection of data of attacks in the wild. So if you want to collect default user and password pairs, one of the ideas I did was just simply scrape all default password lists. So I would like to first thank all the nice guys from Fino Elite, Certnet, Liquid Matrix, DexNet, Security Override, that left their password lists of default devices on the net. They're crawled and of course organized and weighted by occurrence, which is useful to test if a simple device has a forgotten password. Ordering collected data by occurrence is the only useful metric we have for analysis of default standard passwords. If you have an idea for a better one, I would like to hear it because if a username and password pair is more common, like admin-admin or admin without a password, it will simply float to the top and it will be tested sooner, so this saves time. Other metrics can be by vendor or by device, but that usually doesn't cover the use case we want to have. For instance, testing for forgotten devices, testing for weak devices, and somebody will say, why would you want to test online devices? What if somebody has failed to ban or SSH guard? Well, certainly then they didn't leave the default passwords in place. The code for this simple thing is available on GitHub. You can use it and also it is available on Metasploit framework as default user password services on hash if you trust my maintenance of this part. The mildly interesting problem are hard-coded backdoors. There is an interesting story. I obtained a HP storage device, which is a Sun storage device that after some time there was this interesting post on the net that somebody found embedded backdoors in HP storage devices. And guess what? I tried the default username HP support and the password badgers in LIT and you can, of course, access the administration part. Billy Ryoset, this year's black hat, posted that almost all airport security controls like access control devices that enable opening of doors or who is registered on the floor from the staff or like things like the morpho itemizer, which is a device that scans for narcotics or explosives when they swab you, that they have integrated backdoors. Those are the usernames and passwords and, of course, all of those devices are networked. Unfortunately, the trend of embedding usernames and passwords as backdoors is starting to take more and more more and more trends. For instance, how many of you know about the last year's leak of a popular VPN firewall anti-spam vendor, which embedded multiple backdoor accounts into their devices? For instance, the best part was my SQL database with the default user product and the password, which was blank, with a remote SSH key with remote access capability. Does anybody remember that? Well, the problem is that this is starting to be a bigger and bigger problem. For instance, those kind of devices cannot enjoy any privilege of being secure. I cannot believe that somebody would want to use them. I would like to say at least people start auditing stuff that you own. But when I say own, I don't mean own as in a bad stuff, but please first consult your lawyers, your local laws, because that can bring you all sorts of trouble. The third one for online attacks is how to collect data sets from attackers. If you want to collect data that is used on online attacks, for instance, what botnets use. One interesting part of this is you can use SSH POT, which collects data from SSH honeypots. So if you run Kip or some other honeypot technology, please ship data to them. It will be more than useful. And that enables you to collect information on public-facing brute-force attacks. And when you download all the username and password combinations that are used, you start to see some interesting patterns. For instance, very popular with a high number of hits is root and let's say, and let it be anonymous, a root or vendor name plus 46. When you Google it, it's not a default password. You cannot find it as a default password. The second one is URL of a popular ISP plus some random string. Third one is twice the research cooperation's name and some other combinations that you cannot find as default starting passwords of some systems. Now, there is an interesting question. Are those backdoors or did somebody have an inside info on how to break some systems and just simply tries everything? Let's not jump to conclusions. But I would ask all of you that are interested in such things just to take a look at the public-facing attack data and you might find some interesting stuff in there. Why are we losing on this front? Because let's be simple. We can't have one centralized repository that will collect all known backdoor data, all known backdoor passwords, unknown starting passwords. When you take a look at any tool that has some password lists for testing, you'll see that you cannot find from where that list came. Who created it? For what use case? Why should you use it? There is no exploratory data that will say from how this password list was created and there is no way you can integrate it mostly with other tools and there is no updates and so on. Reading pull requests of GitHub is not an idea when you want to see how something ended up in the one tool. This is a manual process and as you see, this part is boring. Nobody can be bothered to add passwords or maintain those lists. So I really have highest regard for people that maintain those lists because somebody has to do it and I would like to just say one thing. We need a centralized repository of that stuff. Let's maintain one repository, not six repositories of those data that are not connected. Let's enable it to be easily integrated in tools and let's enable the data to be vetted. So if you want to work on this, just contact me and we'll figure something out. From the boring part to the interesting part, how to speed up cracking for non pseudo random passwords. Let's first find out how people create passwords. That's the question. If you can see how somebody creates his password, then you can probably create a better attack. Why should you want to do that? Because that's also connected to another thing. You want to create word lists or other data that is targeted for specific class of users. For instance, if somebody wants to attack people here, he'll use security, activism, 31c3 and other keywords and data derived from that to enable his attack. So what choices do we have when we want to crack a lot of hashes or when we want to crack passwords? First is always the simple one and always the most fun one. Build a faster cracking machine, like multiple GPUs, multiple processors, building clusters and so on. The second way is to simply attack the algorithm. For instance, MD4 algorithm was known to be that you can break it with a pen on a piece of paper. Third one is bypassing, which means use something like pass the hash attacks, differential power analysis or do all the things that NSA does to simply obtain the password somehow from the machine without any interaction on cracking the real password. The third one is just on brute force the password. Try a smarter attack. Then people use word lists, then they use rule files, then they use time memory tradeoff attacks, like rainbow tables, then they use Markov attacks, mask attacks, fingerprint attacks or something else. So with a mandatory XKCD reference, people create passwords like this, write pass phrases. How can you crack it with a normal tool? It's a little bit more difficult, right? So I wanted to create something that you could write something like this, which is an actual rule file that will say, oh, just permute all four words from a word list and let's see what comes out of it, right? The idea behind that was a data-driven approach. So how do you learn how people create passwords? First, you need data and data is, believe it or not, hard to find on the net, at least quality data. Some of the nice people on the internet contributed a lot of good research data, which means clean password dumps with occurrences. There should be raw hashes or all plain text, not just a snippet or some fished parts, which means that your choices are really limited. The learning set was 32 million passwords, that means 14 million unique passwords, which all Cyrillic, Greek and other lettering was dropped out because they didn't want to include those languages in the learning process because they don't know them real well. And the three biggest lists that were available was Rocku, PHPBB and Yahoo lists. Anybody that does password research knows those lists as popular public lists. The verification data was two sets of lists. The first one was LinkedIn, which is the second biggest leak after Rocku, and the third one was from EliteHackers, CarderCC, Hack5 and MySpace pages. The idea was first, let's use a machine learning algorithm and figure something out, which failed miserably. And after that, I created something called sieving, which is the following idea. Just create a classifier for each identified password class. You cannot classify all passwords. You can't figure out how all passwords are created. Let's just see a subset. Create a classifier that can identify a subset. And then for each of the classes, create one classifier, decide which ones are probably the most correct ones, which ones were right, and then dump everything in a format that enables analysis. These are some password classes that I identified. Weak patterns like just repeating some letters on the keyboard, keyboard patterns. Some really passwords that I thought that were really strong were really keyboard patterns. And the pseudo random password you saw on the first three or four slides was actually a keyboard pattern. It looks like a really strong password, but it's created by just following keys in a sequential way on the keyboard. There are also, you can see all kinds of stuff, mutation, that's anything that changes a word, change O into zero, change I into one, and so on, or combination. Just combine a few words, combine a few elements, or just simply use elements. When you see each of those classifiers can be iterated and you can tear passwords apart. The second problem is how to identify languages. For instance, if you say security, you know that it's popular in all languages, but how can you say if you say something like plaza, which is what is that? How can you identify that word? So, if you want a good word list, you simply can't rely on things that are available. First off, Google ngram corpuses are too big, same thing as with hunspell and aspell, which fail on one thing. People don't create passwords by using grammatically correct sentences. They use all kinds of stuff. For instance, info about things from a popular series, like names of characters, names of places from popular books, or so on. So, one ideal source is Wikipedia, because Wikipedia has all kinds of articles that are not really bound by language, right? So, I dumped English, German, Croatian, French, Spanish, Italian, and Dutch Wikipedia, and that created a big password list that can be used to identify characters. And to improve on Wikipedia, Google has, Wikipedia has about 84 main categories that I identified, and I used those categories to scrape 51st results from Google to improve on the Wikipedia dump. To enable you to create password lists from Google searches, I created a small script called gword list, which will scrape top and Google results based on keywords or Google dorks or Google hacks, how you want to call them, which you can enter. This will also create a word list based on occurrence that you can just simply pipe into a password cracking tool or anything else that you want. So, why do you need this? Because usually, if you use that, it will be more effective than if you use a generic word list. If you're attacking security researchers, you can use, for instance, that data, which will have specific information or specific keywords that they might use, right? And of course, you can recurse. You can take the resultant list and then recurse on that and create an even bigger list. That's an iterative approach. First, you ID known elements with the sieving algorithm. You create a rule that's the composition and you collect all the data, how a person created the password, what words did he use, what numbers, what replacements, what mutations, everything. If you have something that is unknown, store it, analyze it later and create a new classifier. And when you're done, you have a set of classifiers, an algorithm that can classify it. In practice, this looks like this. So, you put something that looks like a strong password and it will tell you it's a keyboard pattern. If you look at your keyboard, you'll see that somebody just typed sequential keys on the keyboard and pressed shift. For instance, other weak patterns can be identified and you can see how people create them. Or a complicated password, something like this, will be disassembled and you will identify all the elements of the password, which can be written down as a rule for unhash like this. So, this here that you see is a genuine rule for the unhash tool that can be executed and it will result in using, for instance, numlen4 is a dictionary of numbers length of four, dictionary length 11 is all words that have length 11. R is replacement, replace i with one, replace o with zero. And strlen4 is a dictionary with all strings of length four. In the data mining process, when the sieving algorithm classified the passwords, I collected all the data. For instance, all numbers that were four in length were collected and after that, you have a word list that is based on occurrence of the top used numbers. What's the probability that somebody will use the number? Rules are supposed to be expressive in unhash and you can use any combination of word lists, strings, generators, any combination that you want to generate candidate passwords. You can add replacements, you can add uppercasing with or without permutations. For instance, if you want to have a password and if you want to change s to five, will you change all? Ss, will you change none? Will you change only the first one or only the second one? If you want to permute it, you don't have to write multiple rules to describe such process, you just have to say one flag and it will permute it to all possible permutations of replacements. And also you can write your own, like Python functions or expressions. The benefit of unhash rules is that you can use any word list you like. You have ones that are shipped with unhash that are generated by data mining on the 32 million password set. You can use Google word list to create additional passwords that are made by scraping some keywords from Google or you can use your own. I suggest you combine the first two methods, scraping results from Google with G word list and using the ones that are shipped with unhash. The use case here is pretty simple. The tool was created to be able to execute rule files and you can pipe it into any popular password cracking program like John the Ripper or hash cat or something like that. The interesting part here is if you look at GitHub, which I think some of you are already doing, you'll see that probably it's the most hideous code you have ever seen. But if there are some Python ingests here that know how to make it faster, I'm open for suggestions. The interesting part is that I used PyPy, which thanks to the creators of PyPy, that actually enabled pretty much faster runtime for the algorithm. Some results are, don't get it, I'll tell a few words later about comparison of results. If you use a normal brute forcing program, it will usually uncover some amount of passwords in some amount of time. I used 24 hours as a measuring element. And what did I do? I used unhash with Google word list and used Google word list to generate a word list that is scraped top 10 pages based on these keywords like LinkedIn, like business recruiting, networking, job contacts. And used that list in combination with the data mind list from. And you can see that you can get at least 20% better results if you add that approach. Why? Because brute forcing enables you to uncover a lot more short passwords that are complex. So this tool will fail if you have a password that looks pseudo random. But if you have a password that is based on combinations, mutations, like the ones I showed before, then that kind of approach really helps. If you see here, longer passwords are pretty much better uncovered by this approach. First off, don't think that unhash is better than John the Ripper, Hashcat or Hashkill. Those tools are awesome and Solar Designer, Hashcat team and Gateway from Hashkill do an extremely awesome job. Just use the right tool for the right job. This here sounds maybe a bit complicated, but you can find on GitHub the example files and all the rule files and you can just simply execute them and it will do its magic. Those are two different use cases. Still, if you have a GPU cluster, going GPU is the king. Or if you have a FPGA border or something like that, if you want to test for slow hashes like Bcrypt, Scrypt, PBKDF2 and you can't afford a large amount of comparisons because the cracking process is slow, then try this approach and maybe it will help. This also helps with passphrases. You know a part of a passphrase, you know how the passphrase was generated, how many words does it have, you can pretty much simply brute force it then. Or in custom testing or in custom tools. One thing I see that data-centric approaches are really interesting. Try to experiment and build something interesting. When I mean build, please don't be evil. It's available on GitHub. It's simple to use. Try it out. And of course contributors and researchers are welcome. And of course I was a bit shorter so you can have more time for Q&A because if somebody is really interested in this topic, I think he has some questions to ask. So you have my Twitter account here, you have my email, you have the GitHub link and I'm open for questions. Thank you very much for the talk. Four questions, please line up in front of the microphones in the room. Are there any questions from the internet? No, there are no questions from the internet. Okay. Questions on comments in the room? On a five. Yeah, sure. You said that you had trouble using a machine learning algorithm. I'm just wondering if you could explain or elaborate why that was. That was because machine learning algorithms were things I tested weren't really well suited because there are really many classes of passwords. So it's the classic problem of overtraining. You train your machine learning algorithm to figure out one part of the story, like one class of learned passwords and then it will fail on the other one. The one thing you can do is try something like random force but they didn't give enough good classification. Or the second thing you can do is something like committee of the machines. Saving algorithms is pretty much inspired by committee of the machines and random force approach because it is much simpler to build a classifier for a type of password because you can then also get contributions on types of how people create passwords. You can get pretty much more complex ideas into how to classify passwords. For instance, what if somebody creates a passphrase and then takes every first letter from the passphrase? How will you classify that? But with the saving algorithm, you can classify that pretty much more easily because you can write a classifier that will give a result for that and give you a probability score. Is it better or is it not better than something else? But does your saving algorithm assume that the classes of passwords are independent? Yeah, the password classes are independent because somebody just combines elements, somebody uses keyboard patterns, somebody uses combines and replaces characters, somebody just uses a word and replaces a couple of words. Those are all independent classes. That's why it's pretty much simpler to do it. Microphone 1. Hi. I was wondering if Unhash also takes into account specific target attributes like a native language and other things. You could also use a tool to scrape the target's website or something like that to add it to the word list and make it more target-specific. And is it capable of doing that? The first part is the machine learning part, which I did, was based on these dictionaries. English, German, Croatian, French, Spanish, Italian, and Dutch. So the rules are derived by those main classes. The scraping you can do with the G-word list tool that is enabled. You can write Google Dorks. For instance, you can write N keywords and add site column something like NL. And you can add multiple entries of keywords with site NL. And it will just tell him to scrape top 100 sites. And you will get a word list that is only Dutch based for those keywords. It works in UTF-8, so you won't have any problems. You can use the resultant word list from G-word list with the Unhash tool. So that's the point that we want to take here. You can scrape the results from the net. You can scrape it with G-word list that uses Google. You can use all Google Dorks that you want. For instance, reducing search to a certain country, reducing it to a set of keywords, telling it, okay, scrape top 500 sites, and then you can use it on the rules that are freely available. And one more thing about implementing password policies, you can see on the site, like they have this policy, like you need to use one capital, one special character, or things like that. This is also implemented or should we just write regex to filter out the passwords in the list? First off, you can filter off rules. It is pretty simple to see which rules are not covered by the password classes or by the password policy. You see by the length, you can see if it has a replacement or not. If it requires one special character, you can reverse-grab only those rules that don't have a specific character, right? So you can pretty much custom design it for any kind of attack you want. That was the point behind Unhash, that you can adapt it to any kind of use case that you want. It's, as I said before, it's not a replacement for John, the Ripper, Hashcat, or anything else. Those are awesome tools for brute-forcing and for their right. But this is more interesting if you want to do an attack on a passphrase, on a slow hash, if you want to just test a specific set of words, like you have a word list that is medical in nature, and you want to, you have a pen test on a, for instance, on a medical facility, and you want to use medical-specific words. You can just use g-word lists to scrape them and then use it with Unhash rules, right? Yes, it would be nice to see a web-based implementation of a word list generator. So you're telling me I should make a cloud cracking service, right? Or a cloud word list generation? A word list generator. Okay, good idea. People if you would like to leave the room, please do that more quietly. It's not that hard. Microphone 4, last question. Thank you for the talk. A quick question you showed for one instance, a performance gain of 23%, have you run any other instances and do you have any metrics for possible performance gains? Okay, the possible performance gains are really hard to measure correctly. That's why I said don't think this is better than John or Hashcat or anything else, because it depends on the password list you gain. If some, for instance, those were general password lists, and as I said, the research data is hard to obtain because there are not really large password lists with occurrences that have an equal number of good users that are, how should I say, conscious security conscious that use hard passwords and a normal type of users. So I use 24 hours as symmetric because you need to do a lot of experiments, each experiment lasts for a day and that's probably the time somebody pen testing without a cluster will just try to without all weak passwords, right? That's why I used big password lists in testing. For instance, LinkedIn, for instance, for instance, Yahoo password lists, elite hackers and so on. Thank you. Thanks a lot, Tony. Please give them, give, give a lot of applause for this talk because I actually thought that was pretty great.