 Okay, folks, let's get started today. It'll be an announcement midterm in a week. No assignments until after spring break. There'll be a practice midterm release tonight, so I'll post that out on Piano Zone. That's available, so access to that is it. TAs and undergrad TAs are setting up awesome review sessions for you to help you prepare for the midterm. Any questions on that? Cool. I'm trying to study and looking at different types of authentication mechanisms, and so somebody remind us what's the, like, most used, famous type of authentication mechanism. Passwords, right? And, okay, so we talked about password-based authentication, lots of problems with it. How can we maybe, as attackers, try to break some authentication and what are some ways that we can do that specifically regarding passwords? Running through common words. Yeah, we can try common words. We can guess common words. We can do that either if we have the hashes, right? We can just hash common words or whatever the algorithm is. We can figure out what that is. We could, on the online system, just try to log in as that user using different passwords. Yeah, other thing over here? Fishing. We can try to trick somebody to giving us their password, right? Maybe pretend to be a trusted authority, pretend to be the login system, and trick them to access it to give us their username password. Awesome. So we talked about dictionary attacks, right? And that is very, very easy to search all likely passwords. So it's something that is very easy that we can do. And we have two types of attacks that depend on what type of access we have. So an offline dictionary attack is when we've stolen the complementary information. So we know F, we know the function of how to map authentication information to complementary information. And we know the complementary information so we can repeatedly try different guesses. There's lots of actually open source really good tools for this. Crack, John the Ripper, hashtag, different types of things that will be able to try to brute force and guess either from dictionaries or different types of information, different types of passwords. So online attacks are where we don't have access to the app, right? So we have to actually talk to the login system itself and try logging in as the user. And there we may encounter problems like we talked about on Tuesday of we may have there may be a timeout that we're incorrect. So we may not be able to guess as fast as if you're just able to generate caches. The website may look at how many incorrect guesses we're making and log in like lock the user's account after a certain number of tries so that they have to use another way to authenticate to the system like calling in and proving their identity that way. So how can we try to prevent? So now we can put on our defender app. So how can we maybe prevent faster guessing? The number of guesses are guesses from a specific IP? Yes, we can maybe try to limit the number of guesses per account or per IP. We can try to use some of those mechanisms. We still run into a problem that's not really necessarily going to prevent it. It may prevent access to one account, right? If you only get three guesses per account or false incorrect guesses, right? As an attacker, I could just use the most common password to try on everyone's account. And I'm going to correct some of them. User interaction captures. User interaction captures, so maybe try to do... So everyone know what a capture is? PCHA? Is that right? Yeah, yeah. Maybe remember what it stands for? Can somebody look it up and tell us? Because I don't remember. It's actually an acronym that's something about being able to detect humans from computers or something. So what's a capture? Completely automated public tiering test to tell computers and humans apart. Wow, okay, yeah. I feel like I missed all that. Is that a program? Is it HCA? No, it's TCHA. Okay, so why? I don't think I'll ever remember that. But there is an acronym that has to do with Turing Test. The idea is how do you give a test that a human can solve and a computer cannot? So what would be a good capture test? Would it be something like if I said what is 5 plus 10? Is that a good capture test that distinguish you from an automated system? Why not? It's probably better than you can. If I gave you a complex division problem, you would probably use a calculator, which is a machine, to do that for you. I'm going to do it by hand. We think of actually, we'll probably get to this in a little bit, but captures are essentially a type of authentication. And what is it specifically trying to authenticate? Are you human? Are you human or a computer? So what are some types of captures that you've seen that may be more effective than a math problem? A bar slider, we have to put it in a certain degree of range. A bar slider? Like what? I get there's like a shadow on page versus move the bar to where it is. Oh, interesting. Okay, so like a picture or something? And it's like shadow and you have to like draw a slider somehow on the shadow or something? Yeah. Okay, so you're using essentially image recognition, right? You have to understand what it's actually going to do. You have to understand the image and see where they want you to play something based on that. Yeah, what uses that? Chinese captures. Oh, cool. Interesting. Awesome. What else? Any other types of captures? Yeah, I can't remember. I don't know if that's recapture. I think that's built into there. But yeah, they'll show you a bunch of different photos and say mark all the photos that have a whatever, a sign, a car, whatever in them. And as you mark them, they'll keep showing you more and more until none of them are in there. So again, trying to do another image recognition task that they should be difficult for a machine to do but we can do pretty easily. What else? That's like a weird like shading or whatever. Yeah, so like a weird like, and maybe like with squiggles or something in it, right? So be like, what is this word here, which you can actually barely read, right? So the idea is maybe a person would be able to determine this, but it'd be much more difficult for a machine. Yeah, these are kind of digital text based captures. There's also another type of recapture that's super interesting that shows you text snippets from a scan of text like a book or something. And then as you fill in what word it is, and they usually give you two answers, one that they probably know and one that they don't know. And so they're trying to test you on one and build up their vocabulary and trying to better train their algorithms with another. Any other ones? Yeah. There's one recently that I saw where it had an image and you had to rotate it until it was like the right way up. Oh, cool. So like, I feel like there's like tests I took in grade school or something like how to rotate an image and what would it be if you rotated it? Yeah, okay, that's cool. So image rotation, yeah. Are there ones now that are like invisible that kind of keep track of how you're moving the mouse around this site? A little bit I wouldn't necessarily call those captures. They're more like browser fingerprinting or behavior fingerprinting to try to determine if you're human or not based on what you do. These are more like, I think captures are more like test things. Like here's a discrete test of a task that can give you rather than just passively observing. But that's definitely part of what's used to authenticate things. Yeah. Those are owned by the recapture people, like the click on the stop signs. Mm-hmm. You get more of them based on like how robotic you've been moving around the site. Interesting. They add other information there. Auditory ones? Yeah, so there's other ones, right? For maybe people who are visually impaired, there's audio-based captures. So how good are captures at preventing automated systems? A little bit of like an economic perspective. Like hey, a lot of people are using them, so it must be kind of effective. And even if you know it's not 100% effective, at least it's going to stop the majority of the kind of background. Automated systems? I think I've heard of people just like paying others like one cent for every capture. Like you just have like 100 people sitting in the room with it. Yeah, so if you, probably I think the best example of this was the Ticketmaster, because they got in, I think sued some automated ticket buyers. So they have a capture on ticket purchases. And so what they found is that hey, it's easy to break. You can pay somebody a penny to break it. And if you make $20 off of buying a ticket and then reselling it later, that's totally worth it. So you can pay people to break them. You can even do what they call, it's kind of like, so some of them you can just break. So actually it turned out Ticketmaster, they were doing word-based things. But if you refresh the page enough times, you'd see that there was like 2,000 different images and they would rotate through them. So you just download all those 2,000 images, break them by hand, and now you have an automated system that can just immediately break it. They also use a technique called Captcha, maybe arbitrage or something. So the basic idea is we'll use Ticketmaster. So there's Ticketmaster, which is protected by a Captcha. Right, so when your automated system goes to access this, it throws up a Captcha. So what you do as the bad guy, so you're, let's say this is your box, you have an automated system that goes there, I'll call this Captcha C. The bot goes ahead and has probably paid for, so the trick is how do you get people to break Captchas for you by either paying or not paying. So let's say there's a free online streaming site, sports, whatever you want. So a free online streaming site that, you know, the bot controls where they've paid somebody and before a user who's a human visits this site, it throws them a Captcha, but they have no idea that this Captcha is actually the one from Ticketmaster. And so they solve it so they can see a free video sharing or whatever, usually illegal or whatever content that people can't get normally and it's for free. So you solve the Captcha and you know it's now fed into some automated system and you solve the Captcha for Ticketmaster, not for this site. So that's the way to do it. There's been research that's been shown that even if you can't break, so the earlier image ones are easy to break and then what they ended up using was using the audio system, you can break the Captchas and then it got to the point I think where you can use Google's own audio systems, like audio listening systems to break their own Captchas systems because the machine learning tech had gotten so good at that point. Anyways, a bit of a tangent, but interesting authentication mechanism here. So yeah, we can definitely think about adding these to stop maybe lower tier kind of attackers or automated systems, but it's not going to stop a dedicated attack. So they're time investment to solve or break your Captchas to break into accounts, they're going to do this and figure this out. Have you ever seen this on Google when doing Google searches? Yeah. Yeah, when do you see that? Using a VPN overseas, doesn't know where. Okay, overseas VPN, where else? Do you get it here at ASU when you're on the Wi-Fi? No? Yeah, we all come out as the same IP address, so Google's trying to do some detections on how many searches we have for IP and at a certain point you can hit that and you'll start getting these Captchas like proof to us that you're not a robot. When you're like, you know me, Google, you have 10 years of emails. Like, what are you doing? Cool. Okay, so yeah, this would be one mechanism. What else could we use to try to come back? So we talked about this a little bit. We can try to maybe deny access to our Hatches, but don't give up the Hatches and the passwords so that we can prevent online attacks. But again, this is very difficult to guarantee. This is kind of a theme of this whole class. You can never really assume that a system is 100% correct, secure, whatever. There's always an avenue for somebody who can maybe try to seal your password patches. We talked about delay. So we can type in a delay when things do this. This definitely happens. Is there anything else we could do? Captchas, we talked about that. Yeah, maybe, okay, so that's interesting. So yeah, we could maybe require policies of users so that their passwords are not easily guessable or found in a dictionary. One way we might do this is do our own password guessing attacks on their passwords. And if we were able to break them, we would tell them to change their passwords. This becomes a little tricky as to, we talked about all the problems with that, right? Now, a user has to then create a special password just for your site because your requirements are different than the password they're used to using. They're much more likely to forget your password so there's problems there. There's problems of, even if no matter how complex the requirements you make are, users will use the easiest password that they can get away with too. And attackers know this and can figure this out. Yeah, so it's difficult, but yeah, that's definitely a great way. What else? Yeah. I haven't actually seen anywhere that does this, but you could ask for increasingly more information. So the first few times you ask for a password and they're like, oh, you've got the wrong plot of time. So now we want your password and your recovery email. Okay, interesting. Yeah, so you could maybe do, maybe additional types of authentication. So after failing a few times, have more pieces of information to try to get together that may get an attacker doesn't have because they're just guessing passwords. Yeah, so what does that actually do? Hey, do you use your Android account or your passwords? Yeah, it's kind of interesting. Let's think user know that maybe they're somebody trying to guess their passwords, but I guess what can they actually do? Yeah, so it's kind of an interesting trade-off, right? It's like you kind of can hook it up to more text of abuse with this. If you, especially if they just have no way to, what are they going to do? Like, I don't know, maybe change their password, but I don't think most users don't actually know their password is not good, right? They, especially if you think, we talk about this a little bit with the suffering example, if you use something that's based off of the keyboard that looks very random, but it's actually not. And a lot of people, like, will look at your password, but you can look basically at your keyboard and get any, like, ZXCVB looks like a fairly random distribution of letters, but it's very clearly the bottom of your keyboard. And so if that's your password, that's one of the ones that they're going to be likely to guess based on whatever the passwords are. So yeah, that's super interesting. And a similar thing with the reset password functions. So a lot of the websites have this restriction that it must have at least one capital letter like a few numbers, but I actually think that makes it easier to get the attacker because you're putting, like, patterns in the password for them. And the attacker now knows, like, he has to look for numbers and, like, special characters. Cool. Okay, let's think about that. Okay, so specifically when we're talking about Route 4 thing here, right? Okay, so let's think. We can do a little bit of math. So let's think eight character, what do you think? Eight character password? A little lowercase. So how many possible passwords, let's say, just trying to reinforce it, how many passwords would, how many tries would you need to reinforce this? 26 to the eighth. What was it? 26 to the eighth. 26 to the eighth. Which is 26 to the eighth. What's that, 208 billion? And now what about if I said eight character password, now let's say lower plus upper, right? So we require at least one upper case password. You don't actually know where it is, so you have to kind of brute force all of that. Maybe you can restrict it a little bit, but let's just think. Okay, we have to brute force now all lowercase and uppercase characters. 52 to the eighth. 52 to the eighth, which is 53 trillion. And then if we do lower plus upper, plus number, plus special, what would that be? 95. 95 to the eighth? I'm going to take a word for it, I don't know what you got for it. What's that, 0.6 times 10 to the eighth? 0.6 times 10 to the eighth? Oh, 15, I'm sorry, 15. Yeah. Cool. So we can get actually a huge increase in the search rate of our users actually going to be, our users aren't actually choosing a random password from the search base, right? That's why searching is so effective. So we can do dictionary search, right? So it just means that if we, so if we search for dictionary words, right, and we just had lowercase, we can just look for every dictionary word lowercase and try to guess about it that way. But if we require an uppercase letter, then we at least have to take every letter in the dictionary and transform it into at least one uppercase character, so you're making it work more digital and then further on with the numbers and special characters, so you're increasing the work that an attacker has to do to try to guess. So I've seen like, what's up my friends, they store their passwords on the first letter, it's capital and it's lowercase, then they follow my numbers, and I think if all the, like most of the users follow that pattern, it just, But do they? That's a question. Do the users follow that pattern? Like there are some patterns, but yeah, so it all depends on the, so I guess it's strictly better than not having that in terms of guessability. It can be much better to think about, we talked about passphrases a little bit, requiring very long passwords can be good, except that users aren't used to doing that, so they may not have a passphrase that they know at the right of what to do. Because yeah, that's the other thing, if you increase the x on it here, you can get much larger search space and that can, even with all lowercase letters, if you have a huge 30 character passphrase, that's going to be much more secure than, unless the attacker then knows that and they look for dictionary words. So that's the other tricky part. There's a hand. Even if an attacker were looking for dictionary words, if you had like a 30 character password, that's like, I don't know, maybe four or five words, now you have a student that can randomly choose any word, the amount of words in the English language to like a fourth. Yeah, how many words are there in English though? I have no idea. It's very large, I assume. Can somebody look it up? I don't have a system that has it, though. 171,000? 100, that seems like a lot. Let's just do dictionary common. Well, I guess we'll go with that, but that seems like a lot of words. Especially if you think like, what words are users going to choose? It's probably going to be maybe like, let's go with 20,000. Is that fair? Have you ever taken those tests to figure out how many words you know? Online? No? Yeah, it says that people use between 20 and 30,000. Cool, thank you. That sounds great. Okay, so we'll use 20,000, right? So we think, okay, so then what's, so 20,000 words we have, let's say three choices for a three word guess. So how many guesses is that? 20,000 to the third. 20,000 to the third. I guess that's still pretty good. What's that, eight trillion? We're just on the order of where we were with lowercase and uppercase speakers. It's actually not that much, because we'll see depending on how fast you can guess, this is actually not a crazy search space. You're going to have to do this another day. But yeah, that's the tricky thing. So if somebody knows your password does this, right, four characters are going to be a lot better guessing. And then if you add another thing, like you started adding capitalization, different words of your password, that would be much better, right? Yeah, but once you start adding capitalization, it becomes increasingly harder to remember it yourself. Correct. Unless it's like a lyric or something, but then you could also look, if people started using passwords in earnest, attackers would start figuring out what are the most common passwords in those two things. Why did you raise that to the three? Was that like implying a three word password? Yeah, so this would be the search space where the first three word password randomly shows it in English. It's also not a phrase, so maybe you could also narrow the states down depending on what actually makes sense for three words. Cool. Okay, so one other technique we can use is we can make it, and this is kind of the fundamental problem of having online versus offline attacks, right? We can do everything we want to the login function, but if somebody breaks into our system and steal all of our badges, then it's locally, right? So we talked about either Tuesday or last week about adding salts to hatches which can help so that attackers can break each password individually. But we have, well, again, this fundamental problem that actually, and by, so when we talk about cryptographic hash functions, do we want those to be slow or fast? We want people to actually use these cryptographic techniques and these hatches, right? Incript and decryption, we want them to be as fast as possible. We want to be able to hash a 30-gigabyte file in order to calculate and validate the integrity of that file, right? We want to do that literally as fast as possible. So hash functions are designed with speed in mind and maybe done by hardware, so they can be insanely fast. They can also be done in parallel, multiple machines and everything. So it's kind of interesting. So we realize we want to use hashing to help the one-way track your functionality of a hash and not be able to go back. But this actually enables attackers to hash and use the speed of hash functions in order to easily get passwords. So, yeah, okay. So we'll see in a bit. Basically, well, what if we use different types of hash functions? So we talk about what if we use the hash function over and over that will inherently slow it down? There's other types of ways that we can actually do slow hashing functions that we would never want to use for the purposes of deciding on the integrity of a file. But for passwords, this is the perfect thing that we actually do want here. Cool. So one of the super cool things about password guessing is we can... So if you think about, okay, you're an attacker. You have... So we talked about, actually, it goes into these, right? So we have... We want to guess all eight-character passwords. You're an attacker. How do you do this? It's a brute forcing attack. It's a scripted algorithm. It's nothing different here. So what do you do? You're an attacker? Create some way to split up the full input set and then just do a bunch of them in parallel. Sure. Or what if you didn't want to do that? What's the simplest way? Just do them all in a row. Change the last letter to D, C, D, E, all that to Z, and then change the second to last character to D and do A, B, C, Z. You just search through all possibilities, and then every time you do that, you then take that password, you hash it, and then you compare it with the hash that you know or something you're looking for, right? And you know that if that character is... if that password is in the... the search page, if it's an eight-character password you will find it eventually, right? Now I'm going to talk about other techniques to speed this up, right? We can distribute this computation. We can maybe use all the cores on your machine. So if you have an eight-core machine with maybe 16 logical cores with hyper-creatic, you can have... you can split the search base up into 16ths and each search through a different part of that search base. If you leverage machines like on Amazon, you can spin up 1,000 machines with four cores each. You can have some cores that are all searching through the search base. So you can search through it 4,000 times faster. You can use your GPU which is able to do computations like hashing very effectively and massively parallel systems. So you can do that. All these type of techniques. But fundamentally, you're computing and you're going through the search base, right? All a's to all z's. Computing the hash and then comparing it. So you're out doing the computation. Don't they have hash tables? Or they already have the hashes for commonly used passwords? Yeah, so why not store all these passwords or all these hashes? So you actually have the hash. You've already done the computation. You store it in a table that says this password maps to this hash. And then later you can trivially break. So this is kind of a classic example of a trade-off, right? If you store the results of your computation you can actually reuse that later to break other people's hashes. What's the problem here? What if your hash is a let's say 512 bits or some bytes. We'll go 32 bytes. No, it's a bad hash. Oh no, 32 bytes would be good. Yeah, we'll go 512 bits. Sorry, I'm doing it. So how much storage do you need to store all lowercase 8-bit passwords 512 bits, which is a lot. We could ask Google, but What is it? 106 trillion. 106 trillion bits? Yeah. I'll just say one trillion bits. 106. 106, but a lot larger. Very large. So how many is that in gigabytes or something that we would be able to comprehend more than just 1s and 0s? 13,000, I think. Roughly 13 terabytes. Is that about right? It's actually not insane. We may have to double check this map. Getting suspicious. I'm multiplying the two numbers you started with. I found 13. Yeah, 13 terabytes. That's actually not insane. That's pretty trivial. You can buy a 13 and 15 gigabyte hard drive for $200, $300 on Amazon right now. That's actually not insane. I thought it would be much higher. But we can fundamentally do this and it's actually a really cool idea and you can actually take this idea further and we won't get into the details of the technique but essentially you can do this and reduce and make a trade-off between the amount of hashing, the amount of computation you do and the amount of storage that you store. So this is a constant time lookup. You don't have to do any computation. You just look at the hash and look it up and your huge table and then you have the password. You can make some trade-offs here to do some interesting things and this is essentially what's called rainbow tables. So but it allows us for instance to do mv5. So in mv5 you can download rainbow tables of these. So a rainbow table of all 1-8 character alphanumeric. So here we have all lowercase, uppercase, and numbers. We can store that which we calculated before. We would take roughly 13 terabytes using rainbow tables and using this trade-off. We can do it in 127 gigabytes which is a huge savings benefit but also a pretty large storage requirement. And the trick becomes the more and bigger and bigger your space is like for instance 9 characters is 690 gigabytes. And you can actually look these up. There's like torrents of these that you can download and play with these that are pretty fun. So yeah, rainbow tables essentially take this idea of storing the password and the hashes make this trade-off so you can do this without storing every single one. But the storage requirements are pretty you can see. And, so then we now have this problem where, okay, this is actually so we know that we can create these tables that can break passwords very quickly if it's mv5 all lowercase or mv5 alphanumeric up to 9 characters. Our rainbow tables can be even the hash table for here, right? So we have every password and we have to calculate the hash of the password. This is only if the authentication scene is actually going to just hash the password and we break that. So how does that change now if we have salts to this password? Would you add the salt length to the password length? Yeah, so we need to now we need to hash the salts to this password, right? And now I can't just do that here in this example of 13 terabytes, right? Because it's not just 26 to the 8, I also have to do this for every possible salt. So depending on how big your salt is that's going to significantly need to create a rainbow table per salt which gets to be prohibitively expensive. So this is why salts come up in many different scenarios. One that we saw of just being able to easily have the same password but also in terms of brute force guessing of passwords now if we can't just use one rainbow table and break all and do five passwords, we have to actually have to create a rainbow table per each unique salt which is going to get insanely expensive. So if your salt space is even so we have nine characters is 690 gigabytes. So now if we add a let's see even if we add a let's go with a 16 bit salt right? So we have a nine character plus a 16 bit salt so that means we have to try basically we need a 690 gigabytes file times all the different salts which is 2 to the 16 which is going to be 690 which we've now pushed the requirements up to 45 petabytes right? Which is going to be much much more difficult to actually do so now we kind of force people to give up this rainbow tables idea and actually just brute force each password individually with its given salt. I think we basically already talked about the salt so a random value that's public known added before hash so yeah you can think of it in two different ways so it essentially each password hash is unique now and this way two users have the same password we won't be able to easily tell just based on the hashes because they'll have different salts they will have different hashes and another way to think about this is having a different F for every user Questions on salts? How would that work for coming back because you're adding a random value before its hash that you're adding a character and then you have to store that so this is the key thing so you have to before if we think about our users table in our database we would have a username and password which would be the hash password so we have a user who has a hash we store the hash or bar whatever that password is now in this case we need to store the username the password and the salts so a username foo would have salt of what would say one five and the password would be the hash of one five so you store that with the password and the username so that way when they log in they log in, you grab the salt and they say which user you're trying to log in foo grab the salt, add that salt to their password, hash it and then check that with what stores it happens and if that's the same then you're correct Any other questions? Salt? So a salt isn't going to be user facing the user wouldn't actually see this the user would never ever see this it's just something to use when they log in and every user has a different password every user if they use the same password has to do a different value so for each salt you need a discrete rainbow table or to break it you have to create a rainbow table per salt so yeah for this one so if my hash I'm using md5 which I shouldn't use because it's a bad hash md5 I would need to use the specific rainbow table that was built based on salt 15 and then somebody else would have salt 16 and then that rainbow table would be completely obsolete, maybe the new rainbow table for that specific salt 16 and this is why then essentially it becomes infeasible to use rainbow tables because you've made the requirements here so huge that they can't possibly build that all but that doesn't mean they can't guess your password that they have this right they can force operations because they have the username, the password and the salt so they can just try for user foo all these combinations here so I know the salt they'll just add it and properly guess and check if the hash matches but the other nice thing is they now need to break each username independently also based on hash so you can't just guess password foo or your password bar and see exactly who in the database has that same password you need to do it for every salt as well cool and then the other super interesting thing we have is so this is that other aspect we talked about of slow hashing so we want different hashing operations to be slow and this is the and to be slow in essentially different ways because we can like we talked about we can do hashing on our CPU and we can multi-thread and paralyze that operation we can use we really want we can even build like an ASIC or a hardware based machine that all it does is compute a MD5 or whatever and those can operate insanely fast and so the idea of the slow hashing is you have this kind of controllable work factor that determines how much it takes to do that and you also have a salt in a hash and so they've now designed different types of hash functions and specifically to be slow so decrypt for example is a is a slow hash function that when you do it it takes roughly like a quarter of a second every time you want to compute it and then you control that work factor of how long it takes so it's intentionally designed to be slow I used to use it on my submission server I took about 300 milliseconds other interesting things are other types of slow hashes specifically s-crypt so s-crypt is another type of hash function that is designed to use a lot of memory so this is the key problem of ASICs of hardware based ASICs or FPGAs memory is pretty expensive so if you make it so that every hash takes 200 days of memory that's kind of out of the range of doing massively parallel on small hardware devices and it's going to force people to use a CPU and maybe a GPU if they want to break it but even then it takes a long time to actually compute so I really like this it's in stark contrast when we think about and talk about cryptographic primitives we want them to be literally as fast as possible so everyone can use them but here actually being as fast as possible makes the attacker job easier so now we need to come up with ways to make the attacker job more difficult so we talked about a little bit on this password reuse we talked about that we have a lot of different passwords on a lot of different areas and we'll look at a little bit of why this is such a big problem so in 2013 3.5 billion YAHU username and password were leaked that's a billion with a B evil of username and passwords were released so think about now that that is I think it's hackers of passwords too but still as people start breaking them you can detect these in more recent cases 412 million adult friend finder user accounts were leaked adobe had 152 million and ebay had 145 million username and password were released so then this is it doesn't matter how big your site is or how many users you have you should consider your databases your username and passwords and hackness and inevitability this should be a good argument for why you shouldn't store your password to plain text because if you think of the habit that's going to be caught 3.5 billion people do username and password being leaked at once and all of those users the password that they reuse them on is just kind of crazy so we'll look at the adobe breach very and so there's a blog post on the bottom if you want to go check out more about this and so what they did is try to dig into this is always great they try to dig in and understand what the usually what you get is just the dump of the passwords so sorry the database so here's the database you have user ID user name, the email address the password data which we'll look at in a little bit which is basically four encoded and a nice password hint column so why do services have this like password hint feature try to remember what the password is maybe you can say oh this is your easy password or your difficult password or whatever but then you can be like this person that's track where to be one two three right so literally you can know the exact password without ever breaking or hashing or anything because their password hint literally contains their password so what's 40 yeah so the top six the top six characters in the upper right of normal keywords allegedly what is it so that's where the name of the layout came from and it's also you can spell typewriter all along the top row which was I guess important for sales people they didn't know what the keys were when doing typewriters anyway so yeah so again like a combination that comes from the keyboard which looks kind of random if you've never I guess looked at a keyboard before but yeah so you have QWERTY and then you have one two three because they probably required you to have numbers or something in your password okay so then when they dug in and tried to look at it try to figure out okay what it is what's this password data so it's base 64 encoded data which you're all very familiar with now and this is funny so this somebody said this is probably their mice face password and this is their regular password fascinating cool so what they did is decoded the base 64 data saw that they were essentially hashes but weird is that the data wasn't constant size so you would expect for a hash that it would be constant size right so you do shot 256 whatever you get a fixed number of bytes for every single password but here you have different amounts so you have some password data these are all in hex when they were decoded so you have some that were like one I don't know exactly how long this is or two some that were three so why is this they didn't box why do you think they did that like a legacy tool had a length requirement probably sure yeah it was faster okay so maybe yeah let's look at this so we can use the password hints a little bit so you have one same old same old two they'll never guess we have this one that says it's virtuously long that's a weird way to put it back so it seems like the longer the password is the more these chunks there are yeah so the second and third one the second output is exactly the same that's kind of super interesting what if it's not actually at all maybe something we should look at based on encrypting it so they're actually not even hashing it they're encrypting it with some key and that's how we can see that there's different blocks because based on the size of their password if the password is longer than the block size it'll have two blocks of output and we can also see based on the repetition that they're using ECB mode not CBC that second block should not be the same but first blocks are different but we can assume or we can think well through this data dump we don't have the key so in this case it actually is the same so we can stop here and think about what's the difference between hashing passwords and encrypting passwords here encryption if you have the key you can go back yeah so encrypting you have the key you can go back so how does that impact it as an authentication mechanism if you eventually get the key based on someone else's say QWERTY123 you can then get everyone's instantly most crypto algorithms they so that would be a known plain text attack but on good crypto algorithms even if you know the password and the encrypted output you can't derive the key even if I told you this is QWERTY123 and you have the encrypted version of that you can't derive the key you should be able to hash it or if the database was leaked there's a good chance the key was actually leaked yeah so if the database was leaked it's fairly likely that the key is going to be leaked right whenever systems are running these the application needs to have access to the key in order to log in because it has to do the login so it has to encrypt your password with the key and then compare it in here so the application needs access to the key so likely if somebody has access to this database they probably have access to the key too what is the benefit of doing this it feels like maybe they were decrypting the stored data at run time for some reason but they wouldn't have a chance to do that if decrypting is faster than encrypting when would an application want to decrypt your password yeah to send you the password reset because they said we're not storing plain text passwords because that's dumb but we want users to be able to get their password if we need to reset their passwords so we don't change it we can send it to them in an email and remind them of their password we can do that this way but now we have this problem if somebody breaks into our system I don't know if the original people would break in but it would be a good chance that they probably stole that key or could steal the key depending on how it makes it I would say it's above plain text passwords in the sense that you need the attacker to do a little bit more work in order to steal that key but it doesn't mean that it's impossible to steal the key so it's a super interesting design to send it in but it's hashing is strictly better also it means that any Adobe person who has access to that key can decrypt all the passwords right and the other thing we want to prevent is plain text passwords so this is maybe one type of design solution but hashing and specifically slow hashing is much much better do you know what the key was? I do not know what the key was I don't know if it's known actually but we can look at this a little bit and we can see so we can look at the count here and we can see that the password data length is 8 bytes, 16, 24, 32 and then going forward we can think that we can look and the other thing is now that there's no salts in encryption right because they're not using CBC they're not using a random initialization vector so all of the so that means 1.6% of people in this data set of what was it 162 million so 1.6% of those people all use the same passwords because their password data is exactly the same so now we have the problem that this crypto if you're just encrypting the password you also need to worry about salts and other types of things so you have 1.6% all use the same data value 0.45 use the second third was 0.44% and so what we can use is we can use super cool so we can maybe try to figure out what these passwords are about we actually do have an interesting thing where we cambered 4.6 so we don't have the key but we do know all these people have the same password so we can use data this is from the rock u data set which was 188,000 passwords that were leaked without any hashing so we know the passwords so we can compute on this data set the most frequent password was 1, 2, 3, 4, 5, 6 the second one was password the third was 1, 2, 3, 4, 5, 6, 7, 8 which is really funny why is it not 1, 2, 3, 4, 5, 6, 7 why two more I guess people think is much better but they end up all using it the fourth one is lifehack which is very interesting so why do you think that's the fourth most here is that a forum there maybe it was the name of the site that got hacked the thing was called lifehack so that's what people use as their password next was QWERTY you can see 1, 2, 3 and just 6 ones so using that you can actually look at you can group all of the accounts that have the same password data and look at the password hints again we're now having this password hint gives the attacker a lot of information because you can see all of these people that have this one is numbered 1, 2, 3, 4, 5, 6 and three of them said that they each say exactly what that is so just with that the interesting thing is it's not even one person the fact that without salt and we know that everyone 1.6% of users have the same password and it only takes one of them to have what the password is in their hints and then now we know we've broken what's that, a million people out of 146, at least a million people's passwords we know was 1, 2, 3, 4, 5, 6 on Adobe we can look at other ones so this next one which was this 8FDA is 1, 2, 3, 4, 5, 6, 7, 8 we can see some people the third most frequent one which is this one is the password as password this other one E5D8 said it's QWERTY and so it looks like from this example that their system didn't allow you to put the password in as the actual password so what did they do, they just added spaces in between their password and QWERTY was the fourth most used one and then the next one was 6 ones 1x6 6 ones so this was 1.111 so it actually matches up really nicely with other data breaches so this is a clear example of people choosing the same password and what a real authentication system is and we can understand what it is by looking at this data dump and kind of thinking through that any questions on this? yeah so what would make the if you see 2 and 3 like the second half the hashed encryption is just saying so like is that a function of the algorithm itself? in this part here or are you talking about this part here? so this part would tell me that they're using encryption so they're encrypting it chunk by chunk and that just means the second chunk is exactly the same so just like the angular picture we saw so we don't know exactly what that chunk is yet until we break it but we know exactly what that chunk is the same as the other chunk so like it may have been because we actually just let's see is this the e2a3 707 yeah so we actually know from this it's 1 through 8 and we know 1 through 6 is 1 block so that's either 7 and 8 or just 8 but isn't it the same one as password flow? oh interesting uh that rhymes with password though that's why it is okay so then in this case it must be yeah okay that makes sense we didn't get into this but this is the padding so the password must be what is it? password is 8 characters? yeah so it's 8 characters it fits up exactly 1 block when you need padding to tell you that so you have another block that's all padding so that must be exactly what this is so all of them that have that block are exactly the length of 5 to 6 that's no padding which one? yeah so it's only 6 long so it has padding at the end so it doesn't need an extra block so it's only so anyone that's exactly 8 characters should have that same last block yeah that's good learning more stuff cool okay so some things then we can think about people who have already been talking about this a little bit is other techniques and approaches we can maybe use to try to detect this so what is a password manager? passwords for you it can maybe actually make a password for you, generate a random password for you there's other companies that do this last pass, one pass P pass, actually I don't know all of them but some of them and they use any other ones that I haven't mentioned one password Firefox has an integrated password manager yeah Firefox has one, Chrome has one Safari definitely has one so you have both mobile and the web what else? key pass open source one I believe right? sorry yeah yeah that's another one so what's the benefit here you never have to actually know what the password is because it's something random passwords can literally be so rather than a password that you have to remember it can actually be randomly drawn from that old 8 216 billion or trillion possibilities you can generate random bits and then transform that to a password and then we have a super complex password for every thing that's possible password managers can send you notices if they find out there's been a data breach or they can warn you if you're reusing a password they might warn you so that you think oh I shouldn't do that they can actually help you with your password hygiene because if they have access to all your passwords they can say hey you're using the same password on multiple accounts maybe go change it and change your password or maybe check if your password is then using a breach so maybe change that password is that a real phrase? is that a phrase? password hygiene? I don't know I think I may have just made it up sounds good I'm sure somebody's excited before me what are the cons? maybe I mean what about your browser do you log in to your browser? not necessarily so the browser based ones it all depends but some of them will store your passwords in a file right so so not somebody is able to get into one device they can maybe get into all of your rest it can actually be super difficult if you're on a machine that doesn't have to if I ever had to log into this super machine here I don't know my ASU password so I have an old iPhone old password manager looking at the password and typing it in from there because I have no idea what it is you could try to sync it across another device and you would do it wrong yeah that can be super annoying then the app has to handle it of syncing between your mobile device and your desktop or laptop and other devices that you own now you have multiple copies of your thing how does it deal with complex things what kind of stuff yeah so that can be a huge thing yeah it bugs like that make you change your password to a new thing but you didn't remember it and then now your new password is gone you have to be able to reset yeah so it doesn't have to be before so in most browsers all of your passwords but what you can do is when they right click on the password and pull inspect and the HTML source page opens up they can change the password type to text and that just reveals the holes as the actual word passwords are I think multiple is really bad well it depends it can actually auto fill can actually be good in the sense of preventing phishing so specifically if they have somebody set up a different phishing domain your password manager would see that and wouldn't auto fill your password on that domain so that would be a hint for you that something is very wrong and even Chrome now has just started to do where if you use their password manager and they see you typing a password for one of your accounts onto a different domain it will actually stop you and warn you that hey this is maybe a phishing attack so it actually interesting parts there but you are they have found vulnerabilities in in the way so like there are vulnerabilities found where they were doing their domain checking wrong and they were only looking for the regular expression at the start so if you had a domain name like google.com attacker.com they would think that that was google.com and they would auto fill your google password in which an attacker could then easily steal yeah so the kind of way to think about this right when you think about all these things as basically keeping track of your password generating a random password per website and this then the next level is that should be then encrypted or locked with a master password good things bad things I think one thing with password managers is they make like this is a weird thing to say they make death easier to handle like if someone dies and you know their master password you can access all their accounts which is helpful yeah so yeah that's actually a legitimate concern right so yeah if you have in a safe somewhere written down your master password and or you can give it to an attorney that can be given the next of kin when you die or whatever you can actually give access to somebody to all of your accounts which can be I don't know why this is but when I kind of opened my password manager in April it prompts me for my computer's password windows prompt you for your windows password maybe it's storing it as a high level administrator user so that way you know what their apps can be able to get to it and let's also add my apps I don't know if they direct the important details that's I guess strictly better than it used to be where it was just stored as a file on your computer so if anybody steals it you can get access to it so ideally you hope that your your password code you have to talk about what's the risk of putting all your passwords in one place maybe breaks in using your master password it's done for what else yeah if you don't sync and you lose whatever machine or device that was on you're in for a rough ride yeah so if you don't have any backups right and you yeah exactly so if you lose access to either so a password manager is just storing a local file that's encrypted with your master password and you have no way to access those backups or you haven't backed it up and your machine goes up the door you think you'll have a very bad time you'll have to reset the password everything yeah you have to trust your password manager a lot yeah so should you trust them so why do you need to trust them a lot I like that yeah they go out of business and then what so then you're relying on them for your passwords they go out of business what else what are some other threats they can get breached right so they're storing your you think about putting a big target on your back if you're the site that stores everyone's passwords it's a pretty big target right so they can get breached all their passwords to get solid but else most of them are online services meaning like if you're on an online Wi-Fi yeah okay so then you have to deal with offline syncing and all these other issues yeah what else can you change your privacy policies yeah they can change their policy like how do you know that they don't have access to your passwords so yeah it's maybe encrypted with your master password but how do you know they're not storing your master password somewhere right they're literally kind of using their software every time you want to monitor your password what about yeah so there's that there's they could so think about compromise not just could somebody compromise their data but somebody could compromise their app so that when you type in your master password it sends your password to an attacker or any kind of auto-fills it sends passwords to an attacker right so not only do you have to trust them you have to trust their software distribution you have to trust their backup policy you have to trust that they're going to stay in business so what do you think good things, bad things one password has let a lot of security researchers like look over what they are doing and see if they can break it I think it's helpful obviously for building trust yeah and increasing your insurance otherwise you can look at how they actually do things so for instance I'm more familiar with last class because that's when I used but they're whole you can look and read about their encryption scheme where basically your entire password list is encrypted with your master password and they store this encrypted blob with your master password they can't read anything again still there's trust you're trusting them that they're not accidentally leaking it or whatever you can also look at how they responded to incidents like security researchers finding bugs and vulnerabilities which definitely happens how do they deal with that and all that other things to think about is so yeah you're so it's definitely a tradeoff and I guess so what do you think about your work while tradeoff I would say yes you only have to now remember one password and if it's only one password you have to remember you actually can make that probably pretty hard for a computer to guess and then every other account is now secure yeah so in some sense so the tradeoff right between this problem of unguessable passwords per website like creating a unique password per website you're trading that off with one password database that you have access to it's also better than some people talk about creating a file and machine with their passwords which could be a good alternative but it's a little bit better it could be encrypted and all kinds of stuff so it's actually extra protection on top of that I actually I personally really like the strategy of adapting a base password to the site using like parts of the URL and I would actually rather do that than use a password manager yeah so yeah it's definitely different types of tradeoff if you told some of your algorithm or let's say one or two of your passwords leaked in plain text a dedicated attacker could probably figure out your algorithm and then maybe break the rest of the sites would be the one downside of that but yeah it's definitely different tradeoffs when I was younger I definitely used to use passwords everywhere and not for each other yeah it's pretty crazy so actually it really impacts you personally I use last pass and I use it I have a few passwords that I know that are difficult that I've memorized like my email so I need to get into email that's a constant password I have a different password for my last pass password but then basically almost all the other sites are already generated from the last pass like I said I don't even know my ASU password so don't ask me for it yeah and it does there are other aspects to think about but yeah it can definitely help a lot especially with this password reuse problem but definitely tradeoffs thank you