 Recall that we normally store at least some user identity, a username for example, and we need to store something about the user's password. And that information is created when the user registers for the system, and then later when the user wants to log in, they supply their username and password and the system checks. It doesn't match between the stored values and the supplied values. If they're the same, everything's okay, they log in. The problem we're working on is that what if some malicious user, if we say that we store the passwords in a database, just a list of IDs and passwords for every user, their ID and password, if a malicious user gets access to our system through other means, maybe through some other security flaw, if they can get access to the system and read the database, then the malicious user has discovered everyone's password. And that's a problem because we should try and design our system and especially our authentication system such that even if there are other flaws in the security, a malicious user cannot find the passwords of our users. So what we'd like to do is not store the password in clear because it's easy once the attacker finds that database. First approach, encrypt the password. The problem with that is that we need a key. So encrypt with symmetric key encryption for example. Then if the malicious user gets this database, it can read these values, it needs the key to find the password. But the problem with that is that we need to store the key somewhere. So if we store the key on the system in some file, then the malicious user can access that file, get the key and therefore decrypt all the passwords. So it's not much of an improvement in this case. We'd need to somehow keep that key secure. We can't require the user to type in the key all the time. It needs to be stored somewhere, stored in memory on disk somewhere. So this doesn't help much because we need to store the secret key. If we can't keep the database secret, then there's not much chance that we can keep the secret key secret either. So the next approach is to store a hash of the password. And this takes advantage of the fact that our hash function is hard to calculate the inverse. It's easy to calculate the hash of the password and get the hash value, but it's hard given just that hash value to go back and find the original password. That's our characteristic or property of our hash function. It's a one-way function. In this case, if the attacker can gain access to our database, which is our list of IDs and hash values, then it's hard for them to find out the original passwords. And we start to do a little bit of analysis and move towards an attack on such a case. And I'll just summarize some of the numbers we were looking at. So the idea is don't store the username and password like the top table. Instead, store the username and the hash of the password, the hash value. So these are the MD5 hashes of these passwords. Such that if the attacker can read this database, what they need to do is take the hash value and find the corresponding password. That's the attacker's aim now. Now, the first way the attacker can try to take the hash value and to find the password is to do an attack on the hash function and effectively try and find the inverse. So we say it's practically impossible to find the inverse of the hash that is given the hash value find the password. Well, the solution for general hash functions takes amount of effort which is related to the length of the hash value. So if you analyze hash functions, the amount of time it takes to take a hash value and find the input where the n-bit hash takes 2 to the power of n operations. So that's effectively a brute force attack on the hash function. It takes 2 to the power of n operations for n-bit hash value. In my example with 128-bit MD5 hash value, what the attacker would need to do given the hash value to find the password is to do 2 to the power of 128 attempts. If we can make attempts at the rate of 10 to the power of 9 per second, then that takes 10 to the power of 21 years. So that's why we say that it's practically impossible given the hash value to go back and find the password in a brute force approach. This is practically impossible because it's too long. Even if it was a thousand times faster, then there's not much change here. So even if we can use a thousand computers to do this all at once, still we don't improve the time much. So breaking the hash function by calculating the inverse is not possible in practice, but one thing is possible and that is using the fact that the input is a password. Assuming the passwords are fixed in length, let's say that 8 characters. Let's say we've designed the system such that the user must choose an 8 character password. So every user chooses one of exactly 8 characters, not more, not less. And they choose from the characters on the keyboard. What are the characters on the keyboard? I said there were 94, but I listed them before to show you quickly. There's the main set of characters you can type on the keyboard, on a standard keyboard. Different keyboards may have other character sets, but 26 lowercase, 26 uppercase, 10 letters, that's 62 characters, plus another 32 punctuation characters I've listed them there. So let's say our passwords are selected from these 94 characters. So now let's say a user chooses a random password. They're not going to in most cases, but let's assume they choose a random password. Then how many possible values can they choose from? Well, 8 characters. The first character I choose in my password is one of these 94. So I've got 94 choices. The second character in my password is one of these 94. It may be the same one, so I've got another 94 choices for the second character. The third character is one of the 94, and the eighth character is one of these 94 characters if I choose randomly. So the number of possible choices in that case is 94 times 94 times 94, 8 times, which is 94 to the power of 8. So if we have an 8 character password, 94 possible characters to choose from, the number of possible passwords is 94 to the power of 8. So now from an attacker's perspective, what they do is that they consider all possible passwords. Remember, they have hash values. They need to find the corresponding password for a particular user. So what they can do is, if we list all of those possible passwords, I've listed here password P1, P2, P94 to the power of 8. If we have a large list here, then the attacker takes the first one, calculates the hash of that, and they'll get a hash value. And now they compare that hash value to the hash value here that we're looking for. This is 0, 6, C, 2, so on. If it matches, that means P1 is the password that was used as an input to the hash function here. That is the password is, what do we have at the top? My secret. If it doesn't match, they try the next password. Get a hash value, compare. If matched, good, we've found the password. If not, keep moving. Keep trying other passwords. The worst case from the attacker's perspective is that they'd have to try all the possible passwords. The number of attempts is 94 to the power of 8. So how long does this take from the attacker's perspective? Well, it depends upon how many passwords they need to try and how long each attempt takes. And the most time-consuming part of each attempt is calculating the hash. If you think what we need to do is take some password, calculate the hash, compare to some given value. Comparison operators are quite fast, relative to calculating the hash. So the time it takes to try all 94 to the power of 8 really depends upon the number of passwords and how long one hash takes. Because that's the slowest operation. How long does it take to calculate hash values? It depends upon the hash function. You can try it. You can use OpenSSL to do a speed test on your computer to see how long or how many hashes your computer can calculate per second. There are websites that will tell us some information about them. It depends upon your computer. I'll go to... Most computers are slower than... So most CPUs are slower than GPUs. So most people nowadays use graphics cards to calculate hashes. They're designed in such a way that they work very well with calculating hashes. So this web page gives some data of different GPUs and so the different models and some different hash functions and the speed. So we're looking at MD5 just for our example. And the speed for MD5 for these GPUs, the maximum here is about 1,451 million hashes per second. About 1.4 billion hashes per second. That's the speed that you can do hashes with that device. And there's some other devices listed there. The fastest ones are the AMD Radeon cards. Let's see if I can find the maximum. Approaching here, 10 billion hashes per second for this card. So that's 10 to the power of 10 hashes per second. So with this card, if you buy it and you use it to try and calculate MD5 hashes, you can do about 10 to the power of 10 hashes per second. Of course, if you have different hardware, different speeds. So now we have 94 to the power of 8 passwords to try. We can do it at a speed of 10 to the power of 10 hashes per second. How many seconds does it take? Simply 94 to the power of 8 divided by 10 to the power of 10. Calculator, we have 94 to the power of 8 hash values or passwords to try. For each password, we calculate the hash at a speed of 10 to the power of 10 hashes per second. So the answer here will give us the number of seconds it takes, convert seconds to hours, and convert hours to days, divide by 24, seven days. So if you run that hardware calculating hashes, it takes about seven days to find the password. In the worst case, that's the worst case. Which is not so bad. If you want to go faster, buy more hardware. Buy 10 devices and you go 10 times faster. Seven days compared to 10 to the power of 21 years. So this is practical. Using the brute force on the hash is not practical. So what the attacker must do, if you store the hash of passwords, the attacker can try all possible passwords. Seven days, not too bad. How can we make it harder for the attacker? So it took seven days in this example. How can we make it so it's longer for the attacker? Seven years. So that it's not reasonable for them to do the attack. What can we do, some simple things? How to make this slower? Sorry, longer password, okay? So let's say my password was nine characters, not eight. Everything else the same. It becomes 94 to the power of nine. All this value times by 94. Okay, we have 94 times the number of passwords to try. So it takes 94 times longer. So 94 times seven days, it now takes two years for the attacker. By adding one more character, then it's about 100 times slower. 94 times slower, which is I think 600 days or so. About two years. So that solves the problem effectively. Two years, no one's going to try. All right, again, we can increase the hardware. But we know longer passwords are more inconvenient for users. So it's not always a possible solution. Nine characters. Think about your passwords. Think about how many of them are nine characters long? Okay, not everyone. How many are random? Okay, this is assuming passwords are random. It's much easier when they're structured. What else can we do? Okay, yes, increasing the password length increases the time it takes. What else? How can we slow it down? Sorry. Okay, this is what we call an offline attack. So delay the attempts that the attacker can take. We're assuming the attacker, let's say, has downloaded the database of hash values. They have the hash values. Then they go with their own computer and try to calculate the hash of many passwords. So we cannot control the number of attempts they make. This is an offline attack where the attacker can do whatever they like. We do not have some control over how many attempts they can make because they're doing it on their own computer. But we still can slow it down. How can we slow this down? Increase the password length or decrease the speed at which the attacker can try hashes or how? Different algorithms can be performed at different speeds. These results, you see it's hard to read, I know, but this speed is for MD5, one hash function. This is for SHA1, a different hash function. MD5 is three or four times faster than SHA1. And there are other hash functions. So you can choose hash functions which are slower, which makes it longer for the attacker to try them. In fact, some hash functions now, especially for passwords, allow you to set a parameter. So it's effectively, they don't do a hash once, they repeat the hash multiple times, which makes it even slower. So if you can slow down the speed at which the attacker can try, then you increase the time it takes them. And that's a common thing today, to use one that is slow to calculate. It's not too slow that it takes 10 seconds to do one hash, but slow enough if you reduce this, slow it down by a factor of 100, you've increased this by a factor of 100. So slow down the calculation of hashes or increase the number of hashes to calculate. Of course, those two approaches are not very useful if we're already using MD5 or if we've got no control of the hash function or we cannot slow down the attack. So another approach is to introduce a new value called assault. No, we skip one thing. Let's come back to our attack that takes seven days. So this attack takes seven days. Not too bad. Let's make it faster. What we do is we calculate the hashes of all these 94 to the power of 8 passwords, store them in a big database. The next time we want to break a hash value, we just do a lookup on that database. And a lookup for the value, that is we store these values, the password and the hash value. Now when I have a hash value as an attacker, all I do is take my hash value, look through this column until I find the value, I've immediately found the password. I don't need to recalculate the hash. It's been calculated for me already. So what we do is we have a large table that stores these values and to find the password for a particular hash. If you know the hash, you just look up and find the password. And a lookup in some database or in some data structure is much, much faster than calculating a hash. Instead of 10 to the power of 10 per second, it may be a thousand times faster. Breaking, bringing this down to seconds, minutes. And that's what happens in practice. Someone goes away and they spend a lot of money in seven days or a month to calculate all these values. And then they sell it to other people so that the other people don't have to recalculate. The other people just go and buy a database that stores all of these values and then they can quickly look up the values. Here's a website that sells, we'll talk about rainbow tables in a moment, but effectively sells a list of passwords and the hash values. The one that we were looking at, MD5, eight characters. Actually this is one to eight characters, so passwords of one, two, three and eight characters. The number of values is about this. Six by 10 to the power of 15 possible passwords, which is 94 to the power of eight approximately. And you can't download this database, but they've created a database that stores all these values. You can pay them a thousand dollars so they'll send you a hard disk. And what you do as the attacker then is just do a look up, which is very, very fast. And it doesn't take seven days, they've got results that take five minutes, three minutes to find the password. So that's a problem because you can reuse this database, only take seven days on the first case. Let's look at how big this database is. I want to store all these values of password and hash value in some database. How big is the database? Let's calculate and we'll turn to that website. What do we have? We have 94 to the power of eight entries in our database because that's the number of passwords in our example. Each entry we store a password and a hash value. One entry, a password, how long is the password? Eight bytes, eight characters. The password we're assuming is eight characters. To store that, how many bytes do we need? Let's say one byte, one character, so eight bytes. And we also store the hash of the password, the hash value. How big is the hash value? If we're using MD5, it's 128 bits, which is 16 bytes. MD5 always produces a 128-bit hash or a 16-byte hash. What we do is for every password, which is eight bytes in length, we calculate the hash value and we store them. Password, hash, password, hash. Store them in a database. So every entry we have 24 bytes and there are 94 to the power of eight entries. So how many bytes do we need? Total size of our database is 94 to the power of eight times 24 bytes per entry. Calculator, 24 bytes times 94 to the power of eight. That's how many bytes we need in our database. Let's convert it to terabytes. Okay, hard disks measured in terabytes. What's a terabyte? Ten to the power of 12 bytes. A gigabyte, ten to the power of nine, a terabyte, ten to the power of 12. So that's the number of terabytes we need to store this database. Anyone got a hard drive this size? 146,000 terabytes? Not practical. About 146,000 terabytes. So this attack in the very basic way of calculating all hash values for all possible passwords, storing them in a database, and then looking up the database to find the password. In theory works, but in practice, if we need to store 146,000 terabytes of data, it's not practical. Compress the data. This assumes we don't have any compression, so we can apply some compression. How good is compression? You compress a file, how much smaller is it after you compress if it's text? Let's say we have a factor of 1,000. We compress all this data, and it's down to 146 terabytes. Much less, but still, who has 146 terabytes? You need a lot of money to even have that amount, okay? So it's not something you can do at home. But it turns out there are special ways to compress or to store this data such that it can be stored in a very, very small space relative to the total amount needed. And we're not going to go through the approach, but there are some data structures that allow you to store the password and hash values such that it gets much, much smaller. And those data structures are called rainbow tables. So it's a way to store this information in a very compressed form. So the raw form, 146,000 terabytes, you can use what's called a rainbow table that allows to store the same information in much smaller space. How much space? Let's have a look at this website. This website has done that. They've calculated, and let's zoom in a little bit. We're looking for MD5. In the second row here, they've calculated the hashes of all passwords of one character in length, two characters, three characters, up until eight characters in length, okay? Which is about the same size as our example. So the total number of passwords is this number, 6 by 10 to the 15, which is about 94 to the power of 8. So it's about the same number of passwords as our example. We calculated in the raw form that would need about 146,000 terabytes. They have stored in a rainbow table 576 gigabytes, and they'll sell that to you on a hard disk. It's for the cost of a hard disk plus the cost of them selling that service. It's about $1,000 to buy that. Maybe there's others as well. So rainbow tables are a data structure that allows to store the hash values and passwords in a much, much more compressed form, which makes it easier for the attacker. And that's just one example, 576 gigabytes. And it varies on different data sets. So now, what does the attacker do? If they've got your database of hash values, so they know the hash value, they want to find the password for your users. They go and either generate their own rainbow table. Maybe it takes them seven days to do so. Or even faster, they buy a rainbow table or they bought it before and they're reusing it. And they just do a lookup. And a lookup from a 576 gigabyte database, the website gives some statistics. It takes in the order of five to 30 minutes, in most cases, depending upon your hardware. So a lookup for that hash value is closer to, let's say, less than one hour that is given a hash value searched through the database for that hash value. Once you've found it, you've found the password. It depends on the value, but less than an hour to find the password now. So really what an attacker needs to do if they already have a rainbow table, takes them an hour to find your password. Not very secure. So we still have a problem. From the system designer's perspective, we want to make it such that the attacker cannot find your password. Using these approaches of trying all passwords, and more importantly, using a table that someone's already created of all passwords, we can get from the attacker's point of view down to less than an hour, which leads to the next solution. The next solution tries to defeat this attack to make it longer for the attacker, so it's not possible. And the next solution is... Let's go back to our slides. A brute force attack we saw talk about seven days. How to speed that up is to pre-calculate the hash values. Get someone else to do it for us, and then we just download or buy the database. That's where we talk about pre-calculated hash values. And then we just need to do a lookup, which is much, much faster than calculating the hash values. And we talked about if we don't do any compression, the problem with pre-calculated hash values is that the data structure is in the order of thousands of terabytes. But with special compression techniques called rainbow tables, the amount of data is usually in the order of terabytes or less. That's manageable. We can buy a terabyte disk quite easily. So the effect is that using rainbow tables from the attacker's perspective, it reduces the search time much faster to find the password, but at the increased cost of storage. We need to store some large database. How do we stop that? Longer passwords, or slower hash algorithms, and or using a salt. Let's look at the third approach. And the final approach, and the recommended approach for storing passwords. Instead of storing a hash of the password, we take the ID of the user. We generate a random value. When the user registers, so the user has their ID and password, the system generates a random value. We'll call this the salt. And we store the salt, this random value, and we store a hash of the password combined with the salt, concatenated. So just take the password of the user, the value they chose, take this random value, combine them together, and hash that. And in our database, on our system, we store the ID, the salt, and the hash of the password combined with the salt. This salt is some random s-bit value. Don't worry about the meaning of the name at the moment, but just think of it as some random value. Now, when the user logs in, this is stored in the database. What happens when the user logs in? Let's go back to our example. Here's an example. So coming back to our first case. First case, store the password in the clear, top table. Not very good. Second case, store the hash of the password. Much better, but with the use of rainbow tables, it's still possible for attackers to work out the corresponding password. Third approach, store three values in our database. The username that John selected, or that's given to John. The system generates a random salt value. In this example, I've taken some random characters, five characters here. So the system generates this. The user does not select this. The user selected a password. John selected the password MySecret. The system combines his password with the generated salt, calculates the hash of them, and stores that here. And same for the other users. So that's what we store in our database now. Now, John tries to log in. What happens? Let's draw and see what happens when John attempts to log in. John submits. So John, when he logs into the system, so here's our system, stores the database, he sends his ID and password. His ID is John, or his username is John. His password was MySecret. He sends that in a secure channel to the system, and the system needs to check. Is this really John? That's the goal here. So what the system does is, with the ID, it looks up in its database here, OK, the ID... Actually, his ID was John. His password was MySecret. The system looks for the ID in the database. OK, John, that's his first row. So now we extract the salt value, A4H star1, just some random characters. The system does that, and combines this provided password with the salt value. It concatenates them. So we get MySecret, the provided password, combined with those characters. And the system then calculates the hash of those values. They get a hash value as an output. If it matches this value, we pass. We authenticate it. If it doesn't match, then we're not authenticated. Because if we have the same password as the registered password, it should be the same hash value. Because we have the same inputs. We're using the same salt. We don't use a different value. We always use the same salt for John. So we'll be able to log in in this case. So that's the normal case. Now, how does this help us? What does an hash hacker need to do to find the password? The same as before, the attacker, we assume, has access to this database. What we want to do is make it hard for the attacker, given this database, to find the password. First thing that they can do is try all possible passwords again. That is, they have their 94 to the power of 8 passwords. What they do is that they take the salt value in the database, the P1, calculate the hash, compare against this one. If it matches, we've found the password. If not, try the next password. That is, do the normal attack of try all passwords, calculate the hash. It takes seven days, we've calculated. It's the same as before. Nothing has changed from that perspective. The attacker still has to try all possible passwords. Let's make that a bit clearer. The first attack. The attacker takes password P1, some eight characters, combines it with the salt for John, which was these five characters, and then calculates the hash. They get a hash value. If it matches the stored hash value, BA58, so on, if this value matches the stored value, then we've found John's password, because P1 must be the original password. If not, we try another password, P2, and then we do it for, and I'll write it all out, then we try for P2, and then for P3, and we keep going and we try, in the worst case, there are 94 to the power of 8 passwords. So there's the first attack. Try all possible passwords with this fixed salt. Note that the attacker knows the salt. It's stored in the database. It's not secret. How long does this take? Well, 94 to the power of 8 passwords at a hash speed of 10 to the power of 10 per second, the same as before. It's still about 7 days. It's the same as before. So we haven't defeated that attack. The attacker can still find it in 7 days. But what using the salt does is it defeats the attack of using this pre-calculated set of hash values, of using the rainbow table. The rainbow table cut the time down to about less than 1 hour. But can we use a rainbow table? So a different attack is to try a rainbow table. So a rainbow table stores all possible hash values of all possible... stores the hash values of all passwords. But a single rainbow table will not use the same... will not include a salt value in it. If we want to have a rainbow table for every possible salt, we need many rainbow tables. The number of rainbow tables depends upon the number of possible salt values. Let's try and calculate that and see if it's a bit clearer. Let's go back. Remember our rainbow table, I'll just note it here, was our data structure that allowed the attacker was about 576 gigabytes. That was the size and allowed this attacker to break in less than 1 hour by doing a lookup. And what it does is it effectively stores P1 and the hash of P1 and P2 and the hash of P2 for all possible passwords. So that was our rainbow table. It stored the hash values for all passwords. And we stored it in such a data structure called a rainbow table so that the size is small and then we just look up the hash value in this column and we find the password. It takes less than an hour, that attack. But we can't use this in the attack now because the rainbow table doesn't include the salt value. We've got a different input. By introducing the salt, we need to have a rainbow table that doesn't have a hash of the password but has a hash of the password concatenated with the salt. The attacker doesn't know in advance what the salt will be. It's random. So to use pre-calculated values to download to buy a rainbow table you need to buy one for each possible salt value because you don't know in advance what salt was used. So that means if we go to this website which sells rainbow tables they have a rainbow table for a set of passwords with no salt value. But if I want a rainbow table for the salt value A4 H star 1 then it must have been calculated with that specific salt value. Fine, that's easy. But I don't know which salt value I need it for. So what an attacker would need to do would be to be able to calculate the rainbow tables for all possible salt values. Then they would sell them on their website. Rainbow table for salt 1 rainbow table for salt 2 salt 3 and then we could perform an attack. But the number of possible salt values depends upon the length of our salt. If I have a salt value which is mine was 5 characters 5 characters is about 6 bits. Sorry. No. 5 characters calculated before is about equivalent to 32 bits. In my example I used a 32 bit salt. That is, I chose a 32 bit random number. I just represented it in these characters. 32 bit binary value is chosen. How many possible salt values are there? 2 to the power of 32. About 4 billion. Now from the attacker's point of view if they want to pre-calculate rainbow tables they need to pre-calculate for each possible salt value. Because we don't know the value in advance. So an attacker like that website that's pre-calculated the rainbow tables would need to do one for salt 1 salt 2 salt 3 for all 4 billion salt values. Which is not possible. Because we've just increased the time to pre-calculate by a factor of 4 billion. Not 7 days but do that 4 billion times. And we've increased the size to store by a factor of 4 billion. So not 576 gigabytes but 576 gigabytes times 4 billion. So the attacker cannot pre-calculate all possible hash values with all possible salts. They can do it for one value yes but when I have my password chosen when I register I may have a different salt value. And therefore we cannot use the pre-calculated value. Everyone 100% clear on this? Questions? It's combined to the end. So okay so when we combine the salt and the password how do we combine it? The front or the end it doesn't matter as long as it's defined. And it doesn't have to be mixed. Remember we take say my password my secret and the salt which was those characters we combine them just at the end if we take the hash we'll get a random hash value. It doesn't matter if they're front it would give us different values but it doesn't add any security as to where it is it makes no difference. We have an example we had what do we have? My secret and we took the md5 sum and I think originally we got the value of zero this is the hash value of my secret but now we combine it with this salt what was it? A4H star 1 we get this hash value so that's all we say when I combine just add the salt at the end we get a different hash value what the attacker would need to do with pre calculated values is that have to have all possible passwords all possible 8 character passwords with this particular salt value stored in a database but they don't know what salt value I have in advance because maybe I have a different random value so therefore they cannot have all those pre calculated salt values so back to your question doesn't matter where you join them as long as it's defined if you join at the start we'll get a different hash value but the security comes in the fact that the attacker cannot pre calculate the hash values it cannot download a database or buy a database that has all the hash values because we need a database for each salt value further questions we have a quiz when next lecture Thursday okay first quiz in the class this is an important concept it can be quite confusing yes but it's a very important concept because it's a major security floor practical security floor in many especially online systems many websites don't use this approach and an attacker gets access to their database and the attacker releases millions of users passwords posts them on a website so it makes all these passwords available so many large companies have been Adobe, Sony and others people have gained access to their database of passwords and if they're not stored in the right way then someone now has access to all of those users registered passwords not using rainbow tables so let's look at the numbers for that the idea of a rainbow table is that we can speed up the time to find the password comparing the values that have been calculated by someone else they still need to be calculated but let's say someone else did it for us if we need to calculate themselves we saw it takes about 7 days but if someone's done it for me it takes me less than 1 hour let's say I'm not prepared to wait for 7 days 1 hour is okay so we said it takes about half a terabyte to store this information but with a salt what we'd need to do is to have a rainbow table using every possible salt value and then what I would do is I would choose the rainbow table that uses this salt value because I don't know in advance as the attacker what salt value the user has so I would choose the rainbow table that uses this salt value and then do a look up and it would take less than 1 hour but the problem is that we'd need to generate rainbow tables for every possible salt value there are in this example 4 billion possible salt values I would need 4 billion tables of this size 4 billion half terabyte tables in 10 years still not going to help and to generate those 4 billion tables would have took someone else 4 billion times 7 days again not possible effectively the salt increases the password length but it doesn't make it inconvenient for the user because the user doesn't know or care about the salt value it's generated by the system but it's effectively increasing the password length making it much harder to pre-calculate can we keep the hash function if we don't use the salt value and just use the hash yes we can do that so if we go back to this approach if we do this then the attack is possible if someone has pre-calculated values it's possible if we have a small password we saw it takes about 7 days to do an attack takes less than 1 hour if someone pre-calculated the values if we use a hash only shift what value we've gone through we said that if the password is 8 characters we can try all possible passwords there's no random value here if you want to include some random characters in there without the user knowing importantly the user doesn't know the salt they don't care so it doesn't make the user's password longer it's not more inconvenient the system creates the salt and stores it itself then that effectively increases the length of the password that an attacker needs to try at least in pre-calculated so if you want to add some random characters that's effectively what the salt is doing here it's just the name of a salt it's random characters random bits we still can't stop an attacker trying all passwords that attack that took 7 days is still possible it hasn't changed here we've just stopped the attacker using pre-calculated rainbow tables stop them from just buying hard disks from some company with pre-calculated values once they buy it and have it then they can use it to attack any password we can stop that by using salt a salt value because the attacker would need the rainbow table with the correct salt value which they don't know let's summarize the best the recommended practice for storing passwords so when you create your website you usually will have a list of users a set of users for your website you need to store their username and their password how do you store it? well when storing login information always store a hash of a salted password we say we salt the password and the concept is take the user's password generate a random value and combine the password and the random value and hash that store the hash value the random value the salt and the ID we'll come back to passwords in a moment what about the salt? how long should it be? should be random 32 bits are longer it's fine see 32 bits increases the time by a factor of 4 billion some systems are shorter but still okay so the longer the salt the harder it is for the attacker 32 bits is fine and it's generated by the system and stored so it stays the same when you create your account the salt is created and it just stays in the database choose a hash function which is slow we don't want a fast hash function because then it makes it easier for the attacker not too slow that it takes a long time to create and check the password but there are some functions which are slow and you can adapt the speed so that when someone logs in maybe it just takes a second to check their password but it makes it very hard for the attacker to try all possible passwords so there are some functions bcrypt, scrypt password based key derivation function number 2 is another one that are recommended for storing and hashing passwords MD5 is not recommended char char 256 is maybe okay but there are other ones recommended importantly, design for failure all of the discussion we've had assumed the attacker can find the password database don't think that you can keep your password database secure design your system assuming that someone can break into it and find the password database because there are many avenues of attack we need to design systems that have multiple security mechanisms we don't rely on just one security mechanism because if that fails the whole system fails so assume someone can find the database make it secure but if they do find it use a salt and a hash so that it makes it very very hard for them to find the password any final questions on salt or storing passwords try and get your head around those concepts maybe if you don't follow repeat the calculations I've done how many passwords what would it take for an attacker to find the password if you want you can try with smaller examples I use 8 characters let's say a 3 character password chosen from the letters abcde just from 5 letters and look at all combinations and then apply that to a more realistic scenario the last few slides about passwords just as some examples that's about storing passwords but all of this is about random passwords but users don't choose random passwords so how do users choose passwords anyone recommendations not of what's your password but what's a good way to choose a password how do you choose a name how do you choose so you use your name as your password okay use a name but I have a dictionary as an attacker what I've done is I've gone and downloaded databases of names in every language so what I do as an attacker when I have and I'm trying to guess your password I try first I don't try random passwords I try a list of words from a dictionary and then combinations of those words and then I try a list of names of people and there's not many there are not billions and billions even then we need with a fast with fast hardware to try many attempts then it doesn't take long to try all possible names and even combinations of names it may stop someone from guessing while they try to log into your system but it doesn't stop someone from guessing if they have the password database from trying many attempts what else can you use as a scheme any other suggestions different suggestions don't tell me how you chose don't tell me your password but you may think of how did you choose your password any ideas or recommendations to the other students how do you choose a strong password hmm birthday okay how many possible birthdays are there in the world not many okay consider the last 50 years or the last 70 years 365 days in the year so over the last 100 years there's 36,000 values to try not many it takes me a second to try them alright combine that with other information so choosing values it's not easy and there's two really two different avenues of an attack one is if someone can try all possible values the other is if someone knows you and they're trying selected values we spoke about the different vulnerabilities if someone knows you they know your birthday they can get they don't have to guess your birthday they know this person uses his birthday to choose his password I'm going to try a few words in his birthday and maybe they'll get it so you need to think of different strategies for selecting passwords and it's not easy you need to consider the different vulnerabilities that were reported or listed in the previous lecture and try and select passwords that defeats those vulnerabilities this is just a couple of slides that someone has done analysis of leaked passwords that is people have found a password database released it on the internet and then people have done some analysis and looked at statistics of the words or the structure of those passwords chosen this was one from a leak of about 300,000 a list of 300,000 passwords I'm not sure which company they were leaked from I can't remember they've done some analysis of those 300,000 passwords people chose and about a quarter were dictionary words a dictionary word is something that's in a dictionary there's only about 100,000 words in the English dictionary consider other languages there's still not many so many people choose passwords using words from dictionaries 8% names of places 14% names of people so this is just and the other big one is numbers they're not 1, 2, 3, 4 I hope but other numbers were chosen other things so 31% they couldn't recognize any pattern so maybe they were good passwords so this is not saying what to select it's just saying some trends of what people do select and the fact that using dictionary words makes it much easier for the attacker because all the attacker needs to do is try all the values in the dictionary 100,000 values very fast an analysis of passwords leaked from Sony the length so typically 6 to 8 characters are the length of passwords some are longer some are shorter it depends upon the system maybe it has constraints some other characteristics most people use only alphanumeric characters letters and numbers A to Z 0 to 9 they don't use punctuation characters so therefore an attacker when they try passwords will try first passwords have just letters and numbers they don't need to try the passwords with characters because it's most likely someone hasn't used punctuation characters it's most likely someone hasn't used punctuation characters most passwords are in dictionaries or what you can download what's called not just normal dictionaries but a list of words which combine normal words combine dates dates of the year combine different names combine characters in different ways like replacing the letter L with the character 1 so there are dictionaries that attackers can use that have all these values to try if you look at the lists of leaked passwords usually the most common ones include these 1, 2, 3, 4, 5, 6 password and so on they are very common passwords in some of the leaks what about use Thai language does it help not really against dictionary attacks because you just get a dictionary in Thai they're available if you know so using a different language if someone knows you then they can guess the language you've used one of several so it may increase the effort by a factor it may double the effort to check it can help a little bit if you start to combine different languages but just using everything in Thai is not much better than using everything in English because someone can just try the Thai dictionary but there may be other ways to improve another thing usually or some systems require you to change your password on a regular basis most users when they're forced to change their password change it by a single letter okay their password was password the system automatically requires them to change it every month second month they set it to password one the next month password two and coming back to January next year password one again so changing passwords doesn't necessarily help because users don't like to change their password sorry forcing users to change passwords doesn't necessarily help changing passwords does but users usually choose the same one or similar so how do you select passwords we will not discuss other than just listing some different strategies make sure users are aware of the issues of selecting bad passwords they know that if they select easy to guess password there can be consequences so inform and educate users of choosing good passwords advise them on strategies for choosing good passwords computer generated passwords the user doesn't get to choose the password when you create your account the system creates one for you anyone seen those systems before has anyone had that where you create an account you can't choose a password but the system creates one random password for you yeah when you got your first account on ICT server I think you got an email from me from the admin with a random password in there can you remember that value unlikely generating random passwords is not very easy or convenient for the users so it can be more secure for guessing but harder for the users pronounceable words so you can have some constraints on those random characters such that for example we have consonants followed by vowels so it's a little bit easier to pronounce something that you can pronounce maybe easier to remember but still not very convenient check passwords either reactively or proactively reactively is all the users on the Moodle system choose their passwords I let you choose your own password and then I have a check every month and I check the strength of your passwords that is I try to crack your passwords and if I find a weak one that I can find easily then I inform you saying your password is weak try and update it change it so if I see that all the users have chosen 1, 2, 3, 4, 5, 6 as their password maybe I can inform them try and use a new one or proactively when the user is selecting the password advise them on the strength and many websites do this now you type in your password the website gives some feedback this is a weak password either try again or this is a very strong good you can proceed so give some feedback when they select the password there are other issues with passwords the main thing we're focused on is storing passwords any questions or discussion on passwords before we stop that think about your passwords that you have for all your systems think about their length are they long enough are they dictionary words are they easy to guess are they reused across many systems that is you use your password for your ICT moodle account and the same password is used for your email for your bank and for other accounts maybe someone hacks into the ICT server gets your password now they have your password for the bank okay or maybe you don't trust me I'm the admin for the ICT server I set up the server that it reads your password when you log in next and now I know your password and your password to your bank your email and everything else so reusing passwords across systems is troublesome so think about what passwords you use and think about how you can improve your password usage yep if you have an old password and you add more characters to the end it maybe depends a little bit on what your original password was and what characters you add but generally the longer it is the more secure it is okay so making it long enough or long is a good approach but of course choosing a password which is password password is even though it's what 16 characters long it's not very secure so you can't just choose any 16 characters choose 16 characters which are unlikely to be in a dictionary or hard to guess we're focused on how to authenticate users a user wants to access a computer system the computer system needs to check that this is the right user the main form is passwords there are other forms and we will not really go through them I think with time token based is to use some for example some swipe card some USB token that provides some other form of authentication for the user something that identifies that user there are different types of tokens for example bank cards now have inbuilt processes embedded processes on them that can do some exchange with the reader to perform some authentication some are contact that is you need to touch them others have an inbuilt antenna contactless so you're starting now with mobile phones to get this near field communications NFC such that you can bring two mobile phones to get together or a mobile phone near a reader and they talk to each other wirelessly to do some form of authentication so the phone is the token in that case the object that identifies you bank cards, ATM cards and so on and examples but not very good for replacing passwords because you require some reader if you lose the token if you lose the card it's inconvenient so use for some systems but not use for many online systems similarly smart cards the other form of authentication is biometric fingerprint, voice recognition eyes and so on so based upon the unique physical characteristics of the user and the main ones your face, shape of your face your fingerprints your hand your retina and your iris the characteristics of them are usually unique to a human your voice your voice and your written signature a common biometric authentication techniques and some are easier than others the general trade off is some are more accurate in identifying someone but some are more costly in terms of implementing the devices to perform the authentication so iris detection is very accurate but it's expensive to implement a system so looking at your iris in that case scanning it you need some expensive hardware to scan your iris but if it works you can accurately identify the user there's not much chance that you'll get the wrong person voice relatively easy or cheap to implement just measure someone's voice, some audio but it's harder to distinguish users it's more likely you'll get two users that are identified as the same person by the system and I think that's all we'll say about them there's a few other slides and not for our coverage we care about what you know passwords we've looked at others include what you possess and what you are or do biometrics and tokens try and understand the main concepts of passwords passwords storage and think about how to select passwords on Thursday we'll have a quiz in class quiz, paper quiz answer some questions over 10 minutes what topics everything I think everything that we know so far passwords and some cryptographic techniques okay so something about the concepts of cryptography and passwords and passwords storage