 OK, mae'r gwrthod mor ystod o'r sefydliad y deis. Mae'n cymleid ymlaen iaith iaith, rwy'n tych yn symud y peth yn ei fawr. Rwy'n fawr am y dyma, ac mae'n rhoi fach i'r ffryd. Rwy'n mynd i'n fawr i'r ysgrifennu'r ysgrifennu, ond mae'n dros ymwng, ac mae'n rhaid i'r fawr i'r fwg, mae'n rhaid i'r fwg i'r fwg i'r fwg i'r fwg. The title of this is Generating Personalised Word List by Analyzing Target's Tweets. Without further ado, I'm just going to pass over and get straight into things and then leave it to you. Thanks very much. Okay. So Ken, anyone see my voice? Is it okay in the back side? Sorry. Great. So hello everyone, welcome to my talk named Generating Personalised Word List by Analyzing Target's Tweets. So let me introduce myself first. I'm Utkushen. I'm usually doing researches and writing tools which are about offensive side of security. I'm currently working for Tier Security and you can find my details on my website and you can follow me on Twitter if you are interested in. So in this presentation, I will start by talking about password guessing attacks. Then I will explain why reducing the word list size is crucial for password guessing. After that, we will see how can we generate word lists based on target's personal topics. Finally, I will demonstrate the Rodeo Tool which does that job. Passwords are our main security mechanisms for the digital accounts since the beginning of the internet. Because of that, passwords are one of the main targets for the attackers. Of course, there are lots of different ways that attacker can gather target's password. For example, the attacker can prepare a phishing website to trick a target into entering their credentials to a rogue website. Or an attacker can conduct a password guessing attack through bird forcing. Password guessing attacks can be described in two main categories. They are online and offline attacks. Offline password guessing attacks are usually conducted against captured hashes or encryption keys. For example, for hash cracking, attacker calculates some password hash and compares with the target hash. There are two variables which affect the success other than the password complexity. They are hardware resources and the type of the hashing algorithm. More hardware resources are providing a speed, therefore increases the chance of success. The other thing is the hashing algorithm. For example, cracking and MD5 hash will be faster than cracking a becrypt one. Online password guessing attack is where the attacker sends username password combination to a service like HTTP, SSH, etc. and tries to identify the correct combination by checking the response from the services. There are lots of different variables which affects the success. For example, our connection speed, service speed, also a website can block our IP address, log targets, account, etc. Online attack and hardware resources has no positive correlation. Therefore, it's much, much harder than offline attacks. Also, most of the web applications have password complexity rules where users have to use at least one number, one upper letter character and special characters, etc. Therefore, reducing the brute force pool to an acceptable size is very important for the attackers. Instead of brute forcing all combinations, we can make smart choices. For example, we can try the all common passwords. If it doesn't work, we can make some smarter combinations again. To reduce the combination pool, Hashcat team created a technique called mask attack. The main idea is people are choosing their passwords with similar patterns. They are not pure gibberish data. Therefore, we can define a pattern which is called mask and do brute force with its boundaries. For example, let's say our password is julia 1984. With the brute force approach, we need to brute force all nine characters with all our charts. The formula is 9 over 62, which is a very, very big number. For the mask attack, since we define the pattern, we don't need to brute force everything. There are 29 choices for the first character, which can be only upper letter. The total amount of combinations are way smaller than the pure brute force technique. It's around 200 billion. But of course, it's still too much for the online attacks. We usually can't send 200 billion requests to a web server and wait its response. Again, but of course, pure brute force and mask attacks are not the only way for password guessing. There is also a science fiction based method based on smart guessing. For example, on Sherlock's hunt of basketball episode, Sherlock Holmes was checking the personal stuff of the target and we're guessing the correct password in one shot. So we have a third method now. It seems very unrealistic, but in theory, it's possible to find Yulia 984 password in less than three shots. We just need to have some Sherlock Holmes skills to do that. Let's assume that target is posted dead to it and we're a Sherlock Holmes candidate. We can make following deductions. Target's outer name is Yulia, and target loves her so much since he or she tewets about her. And the target's favorite outer is George Orwell, whose most popular book is 1984. So combine them together, the answer is Yulia 984. Is this that simple? We will come back to this later. So according to some researchers which are conducted by Carnegie Mellon University, most of people are choosing words from their hobbies, sports, movies, etc. for their passwords. This means that most of the user passwords are contained meaningful words and they are related with the password's owner. So in theory, we can become a Sherlock Holmes on password cracking. Actually, we can prove that people are mostly using meaningful words for their passwords. When we analyze leaked MySpace and Ashley Madison password slits and generate most used masks, we can see that almost 95% of the passwords are formed by sequential alphabetic characters. So there is a high problem that these are meaningful words. Let's try to prove that they are actually contains meaningful words. So what is a meaningful word? We can say that a letter sequence is a meaningful English word if it's listed in an English lexicon. For those who are not familiar with the NLP context, lexicon refers to the complete set of meaningful units in a language. We used wordnet lexicon for this job. Our analyzes show that almost 40% of those word lists are included in wordnet lexicon, hence they are meaningful English words. Now we need to apply post tagging, which means part of speech tagging, to these words to understand what kind of words they are. Post tagging is a process to find a words class. There are eight different part of speech classes in English language. We learned them in English lessons. For example, there are nouns like chair, table, et cetera, and they are verbs like going, eating, et cetera. So we analyzed those words with the help of Python's NLTK library, and the results showed that most of these words are singular nouns. So let's recap what kind of facts that we have so far. First, our analyzes show that people are using meaningful words for their passwords. And the second, from the research conducted by various of universities, we know that passwords are mostly based on personal topics. So Sherlock Holmes' method is legit in theory. But can you be done in practice? But Sherlock Holmes did was analyzing personal topics of the target. Then he combined them in his mind and came up with a candidate password. But how can we do it in real life? To achieve this, we need information about the targets and an algorithm which extracts good password candidates from that information. We need a data source about the target just like Sherlock Holmes had. We need a source that we can find hobbies and other interest areas of the target. Actually, we all know that kind of source. It's Twitter, of course. In Twitter, people are tend to write posts about their hobbies and other interest areas. Since there is a character limitation for the tweets, users need to write things more focused. And this makes things easier for us. We don't need to deal with large, gibberish texts. So let's use the Twitter as a data source and try to build our personalized wordless generation algorithm. First of all, we need to gather tweets from targets via Twitter's API. Then we need to get rid of unnecessary words. How do we know if a word is necessary or unnecessary for our job? Firstly, since we are trying to find personalized things, we can remove any stop first since everybody is using them. So we can remove things like I, my, she, etc. Secondly, as you recall, our research showed that people are using mostly nouns for their passwords so we can remove any verbs. For example, we remove suggest words from that tweet. As you can remember from previous slides, leaked words lists were mostly formed by nouns. So we can again apply post-aggin to the rest of the words and detect most used nouns and proper nouns. For this tweet, nouns are outer and outer, proper nouns are George Orwell and Julia. Sometimes users are combining two meaningful words for their passwords. But of course, they are not usually two random words, but they have a semantic similarity. So we also need to combine similar words. We used WordNet's pet similarity algorithm for detecting semantic similarity of the words which are extracted from the tweets. Pet similarity algorithm gives us a score between 0 and 1 and we combine them together if the score is greater than 0.12. In the example shown in the slides, we will combine cat-tuck and flame-trover with each other. Researchers have also found that some of the most used semantic teams in passwords are locations and years. For this job, we will send the extracted proper nouns to Wikipedia and parse-related years and cities from them. In this example, we sent George Orwell to Wikipedia and it returns us words like London, 1984, etc. So the last step is combining all of our data. From the example tweet, we got George Orwell word, we sent it to Wikipedia and it returned us 1984 words. Beyond that, we also had a Julia as a proper noun. So when we combine all of our data, we will have the correct password Julia 1984 in somewhere. So instead of millions of combinations, we could crack this password for 20 or less steps, just like Shalu Combs did. So to automatize all these processes, I coded a tool named Rodeola. It is written in Python and mostly based on an LTK library. It follows the algorithm that I described in previous sections. With a given Twitter handle, it can automatically compile a personalized word list with elements such as most used nouns, proper nouns, paired words, years and cities related to them. Currently, it only supports English, but I will finish the Turkish and German support soon. You can use Rodeola in three different modes. In the base mode, Rodeola takes a Twitter handle as an argument and generates a personalized word list without any fancy stuff. For example, when you give Elon Musk username, it generates passwords like Tesla, car, SpaceX, foreign company 2018, et cetera. In the regex mode, user can generate additional strings with the provided regex. These generated strings will be appended prefix or suffix to the words. For this mode, Rodeola takes a regex value as an argument. Regex defines the string placement. With a given regex value shown in the slide, it will generate passwords like Tesla root 01, Tesla root 02, and goes like this. In the mask mode, user can provide a hashtag-style mask values for the word list generation. In this example comment, we used a mask in which first character is upper letter, second is lower letter, third one is upper letter again, and goes like this. If you don't have any Twitter API or you want to use another data source, you can bring your own data. Rodeola provides you two different options. In the first one, you can provide a text file to Rodeola, which contains lots of harvested texts. In the second one, you can provide a URL list and Rodeola harvest texts from these URL automatically and builds the personalized word list right away. You can download and try the tool from our GitHub page. Okay, so demo time. To make this demo, I will get a Tivin Trandall from and volunteer in the audience and we'll pass it to Rodeola and we will check its results together. So is there anyone who is actively using Twitter in English to share user name with me? Anyone? Or you can give some celebrity user name. It's fine also. No one? Excuse me. Sorry. Don't talk to me. You're talking with your friend, sorry. Okay, sorry. Okay, so let's choose a random celebrity then. Okay. So last chance. Anyone? Defconn. Yeah, let's try Defconn. Let's see what we will have. So now it's downloaded from the Defconn Twitter account. Probably it will download like 2,500 tweets which is Twitter's API limit. It will need to be done in 10 seconds. Now probably Defconn's Twitter account has only 1,013 tweets. Now it's trying to detect most used nouns and proper nouns on the download tweets and also cleans data from the stop words and something like this. So most used nouns are obviously Defconn, Defconn26, Defconn27, NewLiveBatch, OpenShare, China, et cetera, et cetera, village. And now it sends those extracted tweets to Wikipedia and will parse the related locations and years. But of course maybe Defconn's Twitter handle is not the best candidate for this experiment. But you got the idea. Okay, let's check what kind of words that we have. We saw them. And some years, some location. So you can use these words to brute force Defconn's Twitter account and let me know about the results. So some Chinese tweets, some hotel names. Again some, I guess Chinese tweets again. Yeah, and Las Vegas of course. So you got the idea. Let's turn back to the slides again. So as a conclusion, since people tend to use words from their interest areas for their passwords and expose those interest areas on Twitter, it's possible for an attacker to create a word list by analyzing a target's tweets. Beyond Twitter, any actor that has much more data about the person will have an ability to create more accurate word lists. Therefore users should avoid using words from the topics that are exposed in social media. It's better to use random passwords that are stored in a password manager software. So thank you everybody for listening to me. Any questions? Sorry? Address? Okay. Don't forget to send some stars. And of course you can send forks from its issues. Alright, thanks everybody.