 We said that storing passwords, we store the password on the system that we're trying to access as well as the username or user identity. And when a user tries to access the system, a simple picture, they supply their ID and their password. So the submitted values of the ID and password, they are compared against the stored values. So we look up the ID and then we look up the submitted password compared against the stored password. If they match, the user is authenticated. If they don't match, they are not allowed in. We said that we don't want to just store the password on the database because there's a problem if this database of ID and password for a system which has many users, if that database is compromised somehow, that is an attacker can get access to that system and get access to that database, then all the passwords of our users are released and that attacker can use that to access the system if we don't reset everything or potentially to access other systems if people reuse passwords across different systems, and they do. So we don't want to store the password in the clear, so we need to hide the password but still be able to check that the password supplied is the correct one. We said that we could encrypt the password and that works, it hides the password but there's a bit of a problem there in that when we encrypt the password in the database when the user supplies their password we need to either decrypt the one in the database or encrypt the supplied one in both cases we need a key and that key needs to be stored somewhere so that the software can do it automatically and if we store that key on the same system where the database is stored then if an attacker can access this database, they can also access the key and decrypt and learn all the passwords. So we don't gain much there if we store the key on the same system as the database. Turns out we don't have to encrypt, we can use a hash function and we mentioned that the properties of the hash that we take advantage of here what we do is in the database for storing the password we don't store the password we store a hash of the password and there are two important properties of hash functions we're relying on the one-way property it's hard for an attacker if they have the hash value to find the original input, the password that is if our database consists of a lot of rows containing user ID and hash of password if the attacker obtains that database they just see the hash values if we've used a secure hash function the one-way property says that if they can see the hash values they cannot easily work back to get the password that's the idea of the hash function here so that they can see the hash values but they don't know what the passwords are so they don't know which passwords to use or to supply on other systems the other point is that how do we check the password when the user submits their password they don't submit the hash value, they submit their actual password our database stores the hash of the password what we do to check if they've submitted the correct password is calculate the hash of the submitted one compare it to the hash of the stored one and the collision-free property says that if the hashes match it means that they were the same password the collision-free property says that two different passwords will produce different hash values or if the passwords submitted and the passwords stored are the same then the hash values will be the same so we submit the password, the system calculates the hash of the submitted password compares against the hash of the stored password if they match user is authenticated if not they're not allowed in so that's the common approach with storing passwords, store the hash of the password we'll go through that a little bit more depth and look at some of the number of what an attacker needs to do to find the password even if the hash of the password is stored so the basic approach is if we store the password in the clear that is the ID and the password then the attacker can get access to that password if we encrypt the password then we need to do something with the key that was used to encrypt it we need to usually store on the system so that we can decrypt but then the attacker may find that so the common approach the first approach is to store a hash of the password you store the ID and a hash of the password so if you have on your system a million users you have a popular website there are one million registered users so you'd have a database with all of their IDs or usernames and the hash of all their passwords when I say database here it could be an actual relational database it could be a file or any other storage system so let's consider this case and give some numbers of if that's the way that we store password what does an attacker need to do? how much effort do they need to break that system? and just go through some simple examples and these examples you have in the handout if you go to the end of the slides on authentication there's somewhere page 133 I've written up a description of passwords, hashes and what we're going through so you'll see this data is on page 133 and beyond so the PDF is on the website as well so this is just an example to say if we store the usernames and the passwords are in the clear then our database may look something like this so we have our usernames we have the passwords that those users chose so when they register with a system the passwords would be stored in the clear the problem with this some attacker so this is the database that we'd store in our system if an attacker can access our system through other means and download this database they've immediately learned everyone's password it doesn't matter how insecure that password was or how random it is the attacker immediately learns someone other people's passwords and that's a big problem for websites because websites one of the key resources or assets is the list of users websites often make money from having a large number of users so if that website is compromised an attacker obtains the database then those users are effectively compromised and may no longer use that website their personal data is released in this example that we'll go through we'll assume that users choose an 8 character password just for this example they don't have to in general another benefit of hashing the password well no we'll come back to that let's keep going so we don't want to store it in the clear we want to calculate the hash of the password so here's the same database the same passwords in fact but the hash of those passwords stored here I think I used MD5 as an example hash function what I did when each user registered their password they choose the password John chooses my secret then the system calculates the MD5 hash of my secret and produces this value 06C2 so on and the system stores that hash value in the database and for each of the users the hash value is stored now in this case if an attacker gets access to this database they don't have immediate access to the passwords they only have access to the hash values what do you do to find the password if you're the attacker now you get this database that is the second table you want to find the passwords well there are two basic approaches you can try and defeat the hash function in the same way that for example encryption we can talk about a brute force attack with encryption we can try every key a similar concept with hash functions is that you can try and defeat the one-way property we say the one-way property is given the hash value you can't go backwards well in theory you can go backwards but in practice it takes a long time and the amount of time it takes to defeat the one-way property to find the input given the output depends upon the length of the hash in general and I'm not sure if it's mentioned on the slides but to defeat a hash function generally takes if we have the hash value a brute force on a hash function on an n-bit hash value requires two to the power of n attempts so to give them the hash value if you want to in general defeat the one-way property you need to take two to the power of n attempts where n is the length of the hash value to find the original input the md5 hash function which was used in this example uses a 128-bit hash value so if you know the hash value to get the original password with a brute force attack requires two to the power of 128 attempts what do I mean by attempts? essentially applying the hash function on different values so one way you can do is just guess a password choose a random value calculate the hash of that see if it matches this and we'll see that we can do it a little bit more smarter than the brute force in a moment how long does two to the power of 128 hash functions take? about a million about a million what? a million seconds, a million days, a million centuries well let's try and put some numbers to it we'll show some different values let's say I need to calculate the md5 hash function two to the power of 128 times to do this attack if we want to put a time to this we need to know how fast is a computer to calculate a hash function let's have some guesses how many hash functions do you think my computer can calculate per second? have a guess how many hash functions can my computer calculate per second? just a guess, that's alright more than a thousand more than a thousand is correct? maybe a bit? millions so millions per second let's see on a couple of computers and then we'll talk about some typical values and then see what that transfers in terms of time so I'll try on my computer actually not my laptop I've logged into my desktop in my office actually that was useful my desktop is quite old now three or four years old it's just a standard Intel i5 CPU three or four years old now so you can get faster let's run a speed test with OpenSSL which you're using for your homework as a reminder you have homework to finish this week md5 is a hash function we're using in this example but there are others we'll see we'll do a speed test I'll not run to the end we'll actually stop there because that's enough value my computer did that's what about 12 million almost 13 million md5 hashes in three seconds it ran it for three seconds so 12 million in three seconds is 4 million per second so my desktop can do about 4 million per second md5 hash functions other hash functions maybe have different speeds char256 slightly slower about 10 million so still in the same order of magnitude millions per second in terms of my desktop my laptop is a bit slower so I hear about 4 million per second how long does it take to try 2 to the power of 128 attempts if our speed is 4 by 10 to the power of 6 hash attempts per second then we can calculate the time 2 to the power of 128 divided by 4 million will give us the time in seconds divide by 60 to convert to minutes convert to hours and let's even convert to days so this answer will give us the number of days of my computer 9 or about 10 to the power of 27 days years it's 10 to the power of 22 centuries so this is brute force attack is not possible on my computer it takes 10 to the power of 22 centuries to do that if I just ran it on my computer but maybe I have access to a thousand computers maybe the lab computers and others so I can spend some money divide by a thousand 10 to the power of 19 centuries so we see like the brute force attacks on keys this is not possible in practice so the answer of how long too long that is it's not possible to do such an attack so that's why well that's the reason for hashing the value but this is a brute force attack we can be a little bit smarter brute force in this case really is try a random input calculate the hash compare against the existing one just to give you another number on the speed so my computer is actually quite slow you can get dedicated hardware mine was an Intel CPU how could I calculate faster hash functions with some common hardware what would I do I want to calculate hash functions faster than using my normal Intel CPU change my hardware to what what other hardware there's not much better nowadays than a Intel 4 gigahertz CPU you won't go much faster maybe you'll go up to 8 million and save you get half the time overclock what about some different hardware anyone play games turns out that graphics cards GPUs can calculate hash functions much faster than CPUs in many cases graphics cards are hardware dedicated to doing graphics type operations and some of those operations there's similarity to how hash functions work so graphics cards can be much faster than general purpose CPUs and it depends upon the graphics card the GPU here's some data maybe a little bit old this is some data if we just look at maybe PC1 and PC2 PC1 had an AMD graphics card we'll zoom in in a moment or we'll see the numbers and PC2 had an Nvidia graphics card and they were used to do the hash calculations not the general purpose CPU and the numbers that they could achieve PC1 this is 8.5 billion hashes per second so my computer could do about 4 million this GPU can do about 2,000 times faster 8 billion hashes per second so with dedicated hardware you can do much better than a normal or an old PC if it was 8 billion it's even rounded up to maybe about 10 billion so 8.5 mega hashes per second 8,581 million hashes per second let's say it was not 8,500 but 10,000 to make it a nice round number 10 to the power of 10 hashes per second this hardware different algorithms have different speeds and different hardware as well let's just record that number we'll use it later what if we could do faster than mine and we could do it about 10 to the power of 10 times per second which is about 1,000 times faster than my computer still 1,000 times faster we saw the numbers was 10 to the power of 19 centuries it's still too long to do a brute force attack but we'll use that number shortly with another type of attack how can we be a little bit smarter as an attacker we want to find the password of John we have his hash value let's assume that our users were smart when they chose passwords let's assume they chose random passwords all our users had to have random passwords that were generated for them still we want to find the password I think if we look up one of the users was that Sandy did choose a random password this was just a random password as an attacker what we can do is since we know the passwords are usually of a short length we should only try passwords of that particular length a hash function takes any length input and produces a fixed length output but when we apply a hash function on a password it's typical that users use passwords which are small we know that so if we know something about the length we can try passwords of a particular length calculate the hash of some possible passwords compared to the value stored if we match we found their password so let's consider how long does it take if we attack by trying passwords of particular lengths rather than just trying random values like in the brute force attack so when I say too long here it's in the order of what 10 to the power of I think 22 centuries or years doesn't matter whether it's centuries or years it's still too long let's try something different let's assume that we know that the passwords are limited to 8 characters we may be the design of the system we know something about the design and that the passwords are up to 8 characters in length no one has larger than 8 characters so as an attacker there are a limited set of possible passwords those passwords which are 1 character in length well shouldn't be many of them maybe the system doesn't allow that but we may consider that those passwords which are 2 characters long 3 up until 8 characters long how many are there? let's assume 8 characters every password the user cannot have more than 8 characters in their password let's go backwards how many passwords would there be if we had well no let's go forwards if we have a password length which is 1 character how many possible passwords are there your password is 1 character long how many possible values are there you get to choose a password let's say you can choose randomly so how many possible values are there if you have a 1 character password I see people calculating how are you going to calculate this A to Z there are 26 1 to 9 what about 0 to 9? another 10 36 what about A to Z uppercase passwords could be uppercase or lowercase generally so that's 62 uppercase lowercase plus numbers is 62 characters can we use other characters in passwords sometimes we can so it depends upon the system often we can well the the upper set is generally limited by the upper limit is limited by the number of characters our keyboard handles normally if you look at a keyboard how many characters can you create in one language, let's stick with English how many characters could we create on a keyboard all there's yes the 62 numbers and letters plus there's all the punctuation characters hash, comma exclamation mark and so on and there's about 32 of them look on your keyboard and count the normal punctuation characters and there's about 32 of those so often we say that the number of printable characters on a keyboard are 94 if you think the numbers all have a character above them of those 32 every number has a special character above it so there's another 10 if I look at my keyboard I don't have a picture there's another set of characters like the brackets comma, apostrophe double quotes and so on I've counted them as I have others and there's about 94 and we'll assume that for this case different keyboards may have different limits so if you get to choose a password which is one character long there are 94 possible values what if you're allowed to choose two characters and you choose them randomly how many possible passwords can we create with two characters the first character could be 1 of 94 the second character could be 1 of 94 so we have 94 squared three characters, 94 to the power of 3 we can repeat the characters if we like so in general in this example 94 to the power of x where x is the number of characters if I'm allowed up to 8 characters or exactly 8 characters there are 94 to the power of 8 possible passwords so what an attacker needs to do because the passwords which were chosen are out of this set there are either 1, 2, 3 or up to 8 characters long what they need to do is to choose a possible password calculate the hash of it compare the calculated value against the stored value if they match we've found the password that is what they do is the attacker knows the hash value let's say they know the hash value of Sandy's password is 5, f, c, 2 and so on that's the value that's known so what they do in this attack is choose possible password p1 maybe it's a 1 character password A calculate the hash of p1 and compare it to the hash value say the this value that's the one we're trying to find or break does it equal h if it equals if the hash of p1 equals h that it means the password Sandy chose is p1 if not then they try another password compare it does it equal Sandy's hash if not move on find the next password maybe the next the letter c for example move on calculate the hash compare if it's not equal then move on and once we've done the 1 character passwords try the 2 character passwords there's only 94 of the first one move on to the 2 character passwords keep going and assuming that the password is 8 characters or less then in the worst case the attacker needs to try all of these possible values one of them is the password one of them will produce a hash that matches so they keep going trying them all and then they try let's say pm and then hash of pm they compare it to Sandy's hash does it match yes now they've found the password of our user Sandy so this is slightly different than the brute force attack where you just try any length values here we know that the password is of limited length so we just try and we know the password is made up of just printable characters so we only have to try that smaller values how many possible values in the worst case do we need to try worst case we need to try them all which is what 94 the power of 8 just add up those numbers there let's do that if you had 4 characters that's 78 million 94 the power of 5 94 the power of 6 plus up to 7 characters is about 6 by 10 to the power of 13 10 to the power of 9 is a billion so this is 60,000 billion possible values if we add in 94 to the power of 8 so it's up to 7 it's about 10 to the power of 13 with up to 8 it's about 10 to the power of 15 so of course the most significant contributor there is the 8 character passwords so we have 6 by 10 to the power of 15 possible passwords and in fact it's not much different what is 10 to the 94 to the power of 8 6 by 10 to the power of 15 94 to the power of 8 is also about 6 by 10 to the power of 15 so the rest the 7 character passwords and so on only add on a little bit more if we need to do all the 8 character passwords then that's the major proportion so this number approximates to about it's a little bit more than 94 to the power of 8 so it's in fact that longest length password that takes the most time so sometimes for simplicity we'll say that well this number is about the same as 94 to the power of 8 because that's so small so large the others are relatively small how long does it take so what we do is we have to for all of those values so 6 by 10 to the power of 15 values calculate the hashes and compare and calculating the hashes the slow operation the value is very very fast so we don't count that in terms of time how fast can we calculate hash values we're assuming we've got that GPU that graphics card and we can do about 10 to the power of 10 per second so roughly we have 6 by 10 to the power of 15 divided by 10 to the power of 10 that gives us the number of seconds divided by 60 divided by 60 again that gives us hours that gives us the number of days so now with our reasonably cheap graphics card maybe 10,000, 20,000 baht takes about 7 days where'd that come from about 7 days that is a reasonable time for the attacker so what they do is that they leave their hardware running for a week and eventually they find a Sandy's password by trying those 1, 2, 3, up to 8 character passwords they'll eventually get the correct password and in fact in those 7 days they won't just find Sandy's password they'll find the hash values for all the other users at the same time because they're also in that same set so although a brute force attack on a hash because of the one-way property in general is not possible because we know the password the input to the hash function is small we can just try all possible passwords and after a week find the correct password so hash hasn't helped much in this case how do we prevent such an attack how do we make it harder for the attacker hash twice hash multiple times then they need to do 2 hashes sometimes hash functions don't work like that the 2 is no better than 1 but if it does then yes if you hash twice you slow things down it's now 14 days they have to wait 2 weeks instead of 1 week maybe the attacker is willing to wait for 2 weeks what else could you do so yes change the way that you hash use the same algorithm 2 times or use a different algorithm some algorithms are faster than others and in this case we want a slow algorithm MD5 is fast to calculate that makes it easy for the attacker we want a slow algorithm such that if the attacker tries this it takes them a long time so here we care about the speed of the algorithm the hash algorithm we'd like from a security perspective a slow algorithm if you look at these results MD5 for this PC got 8.5 billion SHA1 only 3 billion so that's a factor of 3 times slower SHA512 less than half a billion so that's much much slower meaning from 8.5 billion down to half a billion then that's almost 20 times slower so instead of taking 1 week it would take 20 weeks if we use the SHA512 algorithm so using a different algorithm can help and there are some algorithms better than others recommend for hashing passwords what else can we do if we require the user to use a 9 character password they must use 9 characters they cannot use 8 or less they must use a 9 character password then with a 9 character password how many possible values 94 to the power of 9 so instead of taking 7 days we'd need to try 94 times as many passwords if we try 8 characters we'd need 94 to the power of 8 attempts if it was 9 characters it's 95 times as many as that so it would take 94 times as long about 100 times 7 is about 2 years 700 days is about 2 years so adding one more character to the password and making people use that 9 character password assuming it's random will make this attack grow out to 2 years rather than 1 week and that can be effective the attacker doesn't want to wait for 2 years to find someone's password they've probably changed it since then any questions on breaking hashes of passwords the numbers are given in the printed handout so you can keep track of them there what it makes clear is that it's possible for the attacker to find the password if we use say a small password we'll use a particular hash algorithm MD5 in this case it's quite fast for the attacker to calculate so we need to carefully design the password selection strategy for the users and the storage so that such an attack is not possible even worse there are ways to speed it up even further for the attacker what the attacker does instead of calculating all these over 7 days they go to a website they go to a website and they download a database or they pay for someone to deliver via a hard disk a database with all the hash values already calculated here's an example here is a database it contains all the ASCII characters from length 1 to 8 that's what we looked at 1 character up to 8 character passwords the ASCII characters 32 to 95 so that covers the printable characters it's in this case 6 by 10 to the power of 15 values that's what we said 6 by 10 to the power of 15 values stored so you can buy this database and this database contains a list of hash values and the corresponding passwords so instead of you as the attacker having to calculate all of them yourself over 7 days you buy this and then you just do a look up you know the hash value sandy's hash value you compare it to the table that you've got in this database and you find the password immediately looking up a table is very fast much much faster than calculating a hash instead of taking 7 days people report numbers it takes less than an hour to find the password from such a database these databases the basic approach is that we store we store all the values of the passwords we don't need to draw it again the database stores the password and the hash of the password for every possible password value the 94 to the power of 8 or 6 by 10 to the power of 15 values and then when you have that database you look up you know the hash value you compare it to this column find the match and then you found the password and that can be very very fast how big that is to store all of those passwords in the hash values how big is the password let's say 8 character password an 8 character password 8 bytes we'll ignore the shorter ones the password is 8 bytes and we have also the hash value with md5 the hash value is 128 bits that's the hash and we need to store 94 to the power of 8 values so we store that many values of password and hash how big is that this database 128 bits right by 8 is 16 bytes the hash value is 16 bytes of storage the password is 8 24 bytes of storage times by 94 to the power of 8 that's how many values we need to store approximately that's how many bytes convert to megabytes gigabytes terabytes this database of all these passwords and hash values is about 146,000 hard disks terabytes let's say you have a 1 terabyte hard disk you need to buy 146,000 of them to store all of these passwords not possible, who's going to do that where are you going to put all the hard disks so the size of storing all these values is not possible but what people have designed is they've designed some data structures to store them in a much more efficient manner essentially compressing that data so instead of having to store 146 terabytes if we store it 146 terabytes we can do a look up in maybe hours not seven days but since that's too large people have designed data structures and this 146 terabytes is reduced down to about 500 gigabytes two different formats 576 gigabytes or even down to 460 gigabytes and the data structure is called a rainbow table a rainbow table is designed specifically for storing these values in an optimized form and it essentially compresses it down to about half a terabyte and that's manageable so what you do you either download this table or you pay for a disk to be sent to you and then you can do look ups the data structure is called a rainbow table and you can get that down to about half a terabyte that's manageable and that's what's done in practice a look up on a half a terabyte rainbow table takes hours not hours factorial but hours the password attack would take seven days if we tried it but the faster way for the attacker is to once they have this database this rainbow table of about half a terabyte they just look up the value in this table and the look up may take in the order of hours depends on what value where it is in the set so that's much easier for the attacker so our use of hash values of storing passwords is still not so secure if our passwords are eight characters or less then you can go in quite easily as the attacker get this rainbow table and you only need one of them because everyone's password is in that set and you can break everyone's password in a matter of hours so that's not secure so the next step to try to make it more secure, the storage of passwords is not just to store the hash of the password but when the user creates a password add a random value to it so if a user has an eight character password add some random characters to it effectively increasing the length and store that and that makes it practically impossible for such rainbow tables to be used and this extra value we add to the password is called the salt let's see coming back to our lecture slides hashing passwords is not enough because of the use of these stored tables which you can easily get access to as an attacker it's possible if the attacker does get the database for them to quite easily find the passwords so we don't gain much in terms of security so the recommended way to store passwords is slightly different where is it? here we have a new value added called a solve it's just a random number and we calculate the hash of the password combined with the salt so what happens with storage is that when you register your username and password you select say you have a random password or the password selected the system generates a random salt value an sbit value stores that in the database in the clear and when it hashes your password it actually combines your password the p with the salt concatenates them maybe takes your password adds the salt at the end and hashes that value together and that is stored let's see an example this is the example extending from our set of users from before this is the recommended way to store passwords so each username is stored in the database the user ID when a user registers the user chooses but the system the software generates a random salt value so here I've generated a 5 character random salt value it's essentially an sbit value and the hash value stored is a hash of the salt combined with the password and that's stored so the three values are stored the salt is public in the database it's not encrypted it's just a random number now what the attacker needs to do if they learn this database let's say the attacker now again wants to find Sandy's password what do they do they know the password is 8 characters or less so define Sandy's password same as before they try password 1 they calculate the hash of password 1 combined with the salt and they know the salt value so we don't gain much of security in this type of attack because the attacker knows the salt value they know the value 9 whatever it is and then they calculate the hash of this possible password P1 and compare it to the actual hash value H if it doesn't match they try another password same as we did before P2 the hash of P2 combined with the salt it's the same value and they compare it to the stored hash value and they keep trying and eventually they'll get the correct password the hash of the password that they try with the salt these characters here just think they're random characters compare it to the hash value H if it matches then we've found the password same as before from the attacker's perspective except we also must include the salt in the hash value how long does this take using our numbers from before how long did it take to try in the worst case time someone wrote down the number how long did it take to try in the worst case 94 to the power of 8 how many days it took us 7 days when we did this before it's exactly the same number of operations the only difference is that when we hash the one character password like the letter A we combine it with the salt value then we try B then we move on to the 2 letter characters we go on worst case we go through all the 8 character passwords and we eventually get to the correct password it still takes us 7 days nothing has changed here so the attacker can still break and find the password in 7 days but they cannot use the rainbow table and that's where the salt has its value we don't gain security in that perspective a rainbow table that we used before stored passwords and hashes of those passwords and we saw that one of them was about half a terabyte that's manageable and if we could find the password in the rainbow table it takes hours so the attack is not 7 days but hours now that we have a salt value to do an attack using the rainbow table what the attacker would need to do is have a rainbow table for that particular salt that need to have a rainbow table and that would have the password and the hash of that password combined with a particular salt value u 9 this random value that the user had and that would take half a terabyte if they had such a table they could still defeat the hash and find the password in the matter of hours only needs half a terabyte of storage the problem with this approach from the attacker is they don't know the salt value in advance so Sandy's salt was this random character if they wanted to break someone else's password they would need a rainbow table with a different salt value calculated note that the users have random salt values that included Daniel's was different mine is different so from the attacker's perspective if they want to look up in the rainbow table they need one rainbow table for each salt value they need a second table which contain the same set of passwords hashed with another salt value taking up half a terabyte and with a let's say S bit salt the salt is S bits long if S is say 16 bits how many possible salts are there if our salt the random value attached is 16 bits long there are two to the power of 16 possible values for an attacker to be able to attack any password using any salt they would need to download or buy disks they would need to buy two to the power of 16 disks each of half a terabyte in length two to the power of 16 32,000 so they need about 16 what's above terabytes? petabytes of storage about 16,000 terabytes of storage if they want to use a rainbow table to break the passwords because the passwords may have any salt value we don't know in advance so we need to have one for each possible salt value but that requires a very large amount of storage if we increase the salt length up to say 32 bits 4 billion hard disks so what we do by introducing the salt is make the rainbow table attack impractical with the rainbow table the attacker can download one database in about half a terabyte and find the passwords in a matter of hours but if we introduce a salt then the attacker needs a rainbow table for every salt value so they need to download a rainbow table for every possible salt value and that requires too much space to store depending on the salt length say a small salt of 16 bits leads at about 16 petabytes of storage necessary which again is too costly or not possible so the recommended way to store passwords you store the ID the system generates a random salt value for the user and when that password is stored initially the hash of the password combined with the salt is stored it prevents rainbow table attacks it doesn't prevent attacks that took us 7 days of trying all passwords to prevent that make sure either you have different hash algorithms or longer passwords but it does prevent look up type attacks using rainbow tables it's hard to see the advantages I think we've gone through a lot of details and many people may not see the benefits of the rainbow tables but you can look up until that point and see how we store passwords and remember solding passwords is necessary one minor benefit of a salt also do any of these users have the same password John, Sandy, Daniel or Steve if you can see those hash values do they have the same password we can't know because you see the hash values they're all different how do we know whether they have the same password but in fact they did if we go back to the original passwords John and Daniel did have the same password if we don't use the salt value the hash values will become the same by adding a random value to it because they'll both get different random values the hash values become different so that's a minor benefit of using the salt as well you get different hash values and different passwords we're not going to look at how rainbow tables work but you should be aware that at least you store a salt, a random value the password is concatenated with that salt combined together and calculate the hash of them combined you can store three values to finish today let's show an example let's bring up an example that you can have a real storage of passwords on Linux operating systems where's the password information stored some of you have seen this in my lab where do I find the password information about the users on my computer what's the file it's into the ETC directory not interfaces not services there's another one or two files that we've mentioned in a lab those that have taken it there's an ETC slash pass WD file this file stores usernames so we see some usernames but the passwords are not actually stored in this file it refers to another file it's called the shadow file and we don't have permission to look at that so that's one protection this is the database the shadow is the database on my operating system now we don't want anyone to access that database so we use some permissions to ensure that it's hard but let's say there is a weakness in my system such that an unintended user can access or the weakness I know the password or I'll switch to the root user the root user can see that and we see and I'll just so the password information for the set of users the root user and others those three so here we have three users on my system looks hard to read but let's highlight one line the line is separated into different fields the first field is the username here's the username then this long field contains three things if you will look carefully there's some dollar signs in there and that separates this long field into sub fields so there's the username or the user ID here's the user ID this number 6 tells us what hash algorithm is used in our examples we said MD5 but there are other hash algorithms number 6 refers to SHA 512 so SHA is a hash algorithm and it produces a 512 bit output value between the dollar signs it's hard to see because they're random characters but those characters are the salt so these are random characters generated by the operating system when you set up the account this value is generated and stored this is the salt and the last long set of characters from here up into here is the hash user combined with the salt value it's stored in a form that we can print on the screen but it actually is a 512 bit value SHA 512 produces a 512 bit output so our shadow file stores the username the algorithm using a number to identify it the salt and a hash of the password are combined with the salt as the attacker what they need to do here's the hash value go on and find our user's password they know the salt value they know the algorithm they need to defeat the best way is to try and guess passwords here the password may not be limited by 8 characters if the user was smart he would have chosen a longer password so if you try and try and call the passwords and if the user chose 10 character passwords and random then you'd need to do an attack that requires 94 to the power of 10 which would take in the order of if it was MD5 would take something like 200 years with SHA 512 even longer so using the salt is to defend against rainbow attacks the hash makes it hard to go back to the original password of course unfortunately users don't choose random passwords so if a user chooses a password from a dictionary or something that has some structure then the attack is easier the attacker just needs to try those possible passwords not all combinations of 8 characters but just words from a dictionary that depends upon the structure of the password chosen so we'll stop there that's an example of the password database stored if you forget everything from today then remember just one thing how to store the recommended way for storing passwords store a salt, a random salt and a hash of the password combined with a salt