 We know we need to store at minimum the the idea of a user a username for example and the password we Said that storing them as is that is especially storing the password in the clear It's not good because if someone obtains this Database of lists of user names or IDs and passwords and I've discovered everyone's password So that's a problem because What we should do is try and design the storage of the passwords because it's confidential information It's important information. We should design it such that even if someone can get access to the database It's still hard to find the password So even if there's some floor some other floor in the system such that someone can Read the password database it still should be hard for them to find the actual password So we don't store it in the clear so ID comma P. Think of that as the the columns in a table of our database We don't do it like that We said another approach was to encrypt the password store the encrypted password So if someone gets the database they see the ciphertext But the problem with this approach although it achieves our aim that the attacker cannot find the password We must still store the key somewhere So our password system must have the key to be able to decrypt and Where do we store the key? Well, if it's sat stored on the same computer as the database It may be possible for someone who gets access to the database to also get access to the key And once they get access to the key then they can easily decrypt So that's not typical approach to store passwords either and other ways to hash the passwords Store the ID and don't store the password But a hash of the password and we know the properties of hash functions say that if the attacker obtains this database They can see the hash value But the properties of our hash functions should be such that if you have the hash value You cannot go backwards and get the original password. It's this one-way property. We require So that's how the hash function helps us in this case we can Store it store the password information in this case the hash value Such that even if someone finds this they cannot get the original password And that's what we're trying to do protect the passwords so that the attacker who gets the database cannot find them and it works that To check what we do is we if we store the hash value when the user submits their password To try and log in they submit their password the system to check the submitted value against the stored value takes the hash of the submitted value and compares to the stored value and again our properties of hash functions is that if the two passwords the one stored and The one submitted are the same then the hash value should be the same so we just compare hash values if the hash values match then log in is successful So this is one approach for storing date storing passwords and we looked at an example The first one was if we store the passwords in the clear That's not good Then we got to the point or this example was where we store the user names and the hash of the passwords not the actual passwords And we said that if the attacker finds this database So they have this table of information then what the attacker tries to do is take for example Steve's hash value and Try and work backwards and find well what password led to that hash value and There are two basic approaches that the dumb approach the pure brute force approach is try every possible password random characters and We say that on a hash function to find the input When we given a hash value Depend the amount of effort it takes depends upon the length of the hash value So for example, if you use MD5 as a hash function, it produces 128 bit hash value so a brute force attempt requires The attacker to make two to the power of 128 attempts to find the password that leads to this hash value So what they do is they try some random string Hash it doesn't match seven five one two seven. No, then they try another random string hash it and keep going and on average it would take them two to the power of 128 attempts to get the right one and Then we calculated if they could do that and if they could do it at a speed of one billion attempts per second One billion hashes per second Then the time it would take them so they had a computer that could run at that speed Would be ten to the power of 21 years. Okay, so the brute force attempt doesn't work But we the attacker can be smarter they the attacker knows in most cases that the passwords that they need to try Are not any random characters They are usually of Some limited length most passwords are shorter than some known values and even have some structure in them So in fact they can try Passwords of a particular length So if we assume That the attacker knows that the passwords are limited to eight characters Just as an example Say we know that the user doesn't choose a password larger than eight characters Then what we need to try is all possible combinations of passwords, which are Either one character long two three up until eight characters long And that's not so many and we calculate okay, how many passwords are one character long 94 How many are two characters long 94 squared? So that's how many the attacker would need to try because the password must be in that set How many are eight characters long 94 to the power of eight? Where does 94 come from? What is 94 in this one this example? 94 watt 94 was we said was the number of printable ASCII characters that are possible on a keyboard So if we count the uppercase lowercase letters Numbers the digits plus all those punctuation characters. There's about 94 of them to choose from So when someone chooses a password the upper limit they can choose from 94 different characters for each letter in their password That's why we use that here If you if you knew the password it was limited to just uppercase and lowercase it would be down to 52 so What the attacker needs to do is try up and to all of these possible passwords One of them must produce the hash value that's stored in the database So if they try all these they will get the right password and if you try all these and We did a rough calculation if you could try them at I increase the speed to 10 to the power of 10 per second 10 billion times per second. It takes about seven days So that's better than 10 to the power of 21 years So this attack is possible. It's feasible. How do we slow it slow the attacker down? How do we force them to take longer? more what Longer password. Let's force our user to have nine characters in their password. Don't allow them to have eight or less force them to have nine and That will mean we have 94 to the power of nine. So We increased by a factor of 94 94 times more Attempts needed. So 94 times seven days, which is what two years or What else can we do to slow the attacker down? Again in this case we did the attacker is doing this on their own computer Okay, so what they've done is they've somehow downloaded the database They've copied it to their own computer and They can use as many computer resources as they can afford to try as many attempts as possible Okay, so this is what we call an offline attack The attacker is not trying as they for example submit passwords to hotmail They've somehow downloaded the database of The hashes from the website or from the computer system onto their computer and then trying to break it So we can increase the password length That will increase the number of the amount of time or what else could we do? No Assuming they've chosen a password of nine characters or eight characters. This is an issue. How many attempts can you do per second? Well, we can't force the attacker to use slower computers They would choose as fast as possible But we could try and use algorithms such that their computer would take more time per attempt So where did I get 10 to the power of 10 hashes per second from? it depends upon of course the hardware and Also, it depends upon the algorithm that was used for the hashing so MD5 maybe a Recent computer can do about 10 to the power of 10 per second But there are other algorithms which are much slower And that's a good thing for security Because slower to hash Means it will take the attacker more time to do a brute force What else do we get to? So this attack involves the attacker taking seven days and they find the password From the attacker's perspective a better thing would be to generate all these hashes 94 to the power of 8 possible hashes save them say on a hard disk and Then the next time we want to break a hash value and find someone's password. We just look up We don't have to recalculate all of these across seven days We can just search through the database and we'll immediately within a few minutes probably find the value so a more From the attacker's perspective and a better attack is once you've done this once Once you found the hash values for all possible passwords of up to eight characters Store all of that information in a database in your own local database or on a hard disk and then When you want to break a hash value Simply look through that database for the corresponding hash value and then find the matching password that can be very very fast for the attacker and with a Story to all these passwords you can an attacker can instead of taking seven days to calculate all the hash values They can just look up through the stored values and find it with people say in the order of minutes maybe hours in the worst case so Calculating hashes is much much slower than searching through a table or a database for particular values doing a lookup So in fact once someone does this and people have done this They calculate all the hashes save it on a hard disk and then sell the hard disk to other people who want to break the passwords and That can be done then in once you have the hard disk you can break passwords in minutes or hours in the worst case Therefore our storage of passwords is not very secure if someone can confine my password in minutes or hours Then it's not Not very secure Now the problem with that is that storing passwords. We said there are about 94 to the power of eight passwords If one password is eight bytes eight characters eight bytes one hash for md5 is a hundred and twenty eight bits or 16 bytes For every value we need to store 24 bytes The attacker needs to store 24 bytes, which is about 146,000 terabytes in total So an attack that involves storing all of this data Can speed up the attack from seven days down to minutes But it's at the expense of requiring a lot of storage Anyone have for hard disk or hard disks for this? No, not good for the attackers perspective And that's almost where we got to last week It turns out that people have devised really compression schemes To instead of saving the raw data like 146,000 terabytes Algorithms to save all of this data in a much much more condensed space in The order of half a terabyte 576 gigabytes is an example of someone who's who sells this database really of password hashes You can buy a hard disk with all of these values on there and Then what the attacker does instead of spending seven days calculating the hash They just search through the database of that 500 gigabytes and they find that the password with an amount of minutes Hours in the worst case So there are ways to make the attack fast and The amount of storage space manageable half a terabyte is manageable So storing a hash Still is not not the best solution Because there are ways for attackers to Within reasonable time and reasonable storage space find the Password given the hash value what they do the attacker does is they Buy a hard disk with pre-calculated hash values that is someone else has done it for them Someone else has spent seven days or months Calculating the hashes Save it on hard disk and then they sell it to the next person the next attacker who then just looks through that database Which is much much faster than calculating and The way that the data the hashes are stored the data structure that manages the compression are called rainbow tables We're not going to study how they work. It's not that but you may hear them When when you come across password attacks the idea is that By storing the pre-calculated hash values For the attacker it reduces the search time from say seven days down to minutes or hours But at the expense of you need more storage space But a few terabytes is not a problem nowadays of storage space So the attacks are practical in this case So we need a new way or a better way to store passwords than just the hash value and The different approaches the countermeasures are that the things to stop those attacks Require the user to have longer passwords Nine characters ten characters and so on. What's the problem? What's the problem with longer passwords? Inconvenient okay, so if we can require the user to have 15 character passwords. That's good for security But not so good for the convenience of the user The other thing is to use slower hash algorithms Choose an algorithm Instead of MD5 choose an algorithm that would make the attacker take a long time to calculate the hashes Again, I said I gave an example of 10 to the power of 10 per second if we use an algorithm that reduces the time Increases the time it takes to attack Then that's better for security and there are some algorithms that are designed to be actually slow Slow in terms of maybe it takes half a second to do Hash as opposed to milliseconds or microseconds and the last one and the one that we're leading to Sometimes we cannot change a hash algorithm. Sometimes we're limited to the algorithms available is to add some salt It's called salting the password, which is just the name the concept is we introduce another random Number before we hash so we'll look at that and that's actually the final way the recommended way for storing passwords so the solution is To store the ID of the user To choose a random number and we'll call it a salt but some s-bit random number and The computer system chooses that not the user. So when the user registers their password They first create an account the system stores or creates and stores a s-bit random value, which we call a salt and When we hash the password we don't just hash the password on its own We hash the password combined with the salt the salt is a random value. So sequence of bits a number, okay We can represent it in different formats, of course So what happens is when now when you create your account instead of just storing the hash of your password The computer system generates a random value for you It doesn't tell you the value. You don't need to know it But it generates and stores that random value called the salt and It hashes the password combined with the salt. We have a picture of this Maybe I do that's not the one Here's an example of that storage of information so that what happens The the registration process before we start Logins by registration. I mean when a user creates an account the user Chooses a password. That's what the user does But the system Generates a random salt. So this is different from the normal registration procedure So when you create your account, maybe you've got a username you choose your own password And when that happens the system generates some random value. We call it a salt Here I've listed the random values just as characters But that we can map them back to binary using some encoding ascii encoding for example So when john chose his password The system chose a random salt and then the system Calculated the hash of the password combined with the salt Okay, just concatenate the two so whatever password john chose Attach the salt to it hash it and you get a hash value and when sandy chose her Password the system chooses a different random value It's random and it's per user And again when the hash value is created it's created based upon the password combined with the salt When a user logs in So after they've registered they make a login attempt Then what happens is that they submit their username and password That's the normal approach and then the system Takes the submitted password And combines it with the stored salt value And if they match the stored hash value, it's correct This is the submitted value So when a user logs in they submit the username and password the system Looks up the username in their database. They find username steve They take the submitted password the one I typed in when I tried to log in they combine it with my Stored salt value concatenate them Take a hash of that When we take a hash of that we get some hash value And then simply compare it to this 1 8 4 b 7 hash value if it's the same Log in successful if there are different log in fails So that's how the registration and login works. We need to look at why is it more secure Any questions on how it works before we look at why is it more secure common It's a it's a common thing that people use when they implement password storage on websites on on computer systems. So it's important to know how How to do it It's even also nice to know why to do it Why is this Better than not using a salt. That's what we want to know So Well, let's summarize what not using the salt was A problem with it When we didn't use a salt It was possible for the attacker to precalculate all the hash values for some reasonable set of passwords for example within A terabyte you can store the hash values for all passwords of up to eight characters long And then it's possible for the attacker just to look up the hash value in that database within minutes or hours in the worst case And find the corresponding password That's if we don't use a salt So let's write some of those numbers down because we'll use them in the example Let's assume the attacker as a database Or a disk When I mean say database that could be stored in any manner it could be a file Just some data structure, but a set of data Of what 94 when we come back to ours 94 to the power of eight Rows Each row contains a password And a hash of that password That takes with rainbow tables about 500 gigabytes to store and about let's uh say minutes to find password That is Let's assume the attacker has this hard disk About half a terabyte and on it it stores information of a list of all possible passwords Of up to eight characters We're limited to that in this example and the corresponding hash values of those passwords So what the attacker uses this for is that when they know a hash value They just search through find the hash value in that column Once I found that they've found the corresponding password And such searches Can be quite fast in the order of minutes in practice So the attacker can find a password quite quickly That's the normal approach of the attacker when we just store the hash of the password But now what we're doing is storing a hash of the password in the salt What does the attacker need to do now? Think from the perspective you're trying to find a password of someone given a hash value. What do you need to do? Any ideas Find the key for the hash what key No key in this one The normal attack involves the The attacker looking through its own database. It's got stored on disk and it's got stored a hash value and it compares the hash value To what's stored and then once it matches finds the password that doesn't take long By introducing a salt this random value Then what results is that the attacker would need a database that was created for the particular salt value That is used for that user for example The salt value for john was this random set of characters For the attacker to use its own pre-calculated database The database must have been created for that particular salt value If they want to find the password for sandy They need a database that was created with this salt value in mind If they want to find my password they the attacker needs a database which was calculated using this salt value in general For the attacker to succeed now They need a database for every possible salt value because they don't know what the salt value will be in advance And see what requirements that places on the attacker when The system chooses a random salt value to store with the The password and including the hash what the attacker needs to do now is to do this quick or fast look up Which takes minutes they must have a database for the corresponding salt value So in general since they do not know what the salt value is in advance They should have a database for every possible salt value So for example, what if the salt value was 32 bits in length it was a random number 32 bits With a 32 bit random number the number of possible values is 2 to the power of 32 Therefore for the attacker to be able to do this look up in their database They should have one database for the first salt value One for the second one for the third and one for all of the two to the power of 32 possible salt values If they do that then they can still find the password within a matter of minutes But that's greatly increased their storage requirements One database was half a terabyte 2 to the power of 32 is 4 billion So now they need what 2 billion terabytes of storage space for this to work Which is not possible So the idea of introducing the salt Is to make it hard for the attacker to do this attack where they use a pre-calculated hash value stored in a database Without a salt They can have a database what half a terabyte is not a problem And they can find the password in a matter of minutes with a salt The attacker would need a database for every possible salt value So if for example, we have a salt of 32 bits that need 2 to the power of 32 different databases That's not possible to store because that's uh billions of terabytes That's why we introduce a salt value Which is just a random number which is stored with the password and hashed with the password That's hard To get your head around. I know it takes a lot of thinking about The different things that the attacker can do If you cannot capture all of that at least Always remember We don't just store the hash of a password. We store the hash of a password combined with a random number Where we call that random number the salt value The salt value is known and stored in the database it prevents The use of pre-calculated databases because it makes the storage requirements too too large The space required increased by a factor of 2 to the s to the power of s where s is the number of bits in our salt So all you need to do is make the salt a reasonable length 32 bits 64 bits And the attacker can no longer use that pre-calculated list of databases questions Before we try and finish this part on passwords Know why not to store the password in the clear Why in practice? We don't just encrypt the password Storing the key is a problem then so therefore why hashes work And also try to remember that we don't just store the hash of the password But the hash of the password combined with a random number So when you implement say a database for your new website And there's a user login system Then in your database somewhere you would store this information the username The password they chose but not stored in the clear Choose a random number for that user called the salt store that and hash the password combined with the salt Note that the attacker does know the salt value if the attacker obtains the database they do learn the actual salt value So the salt doesn't make it any harder for the attacker to do And do that seven day attack and try all the passwords again. They can do that But it makes it harder for them to use a database that was Calculated in the past and quickly find the password with a matter of minutes So the salt only prevents the use of Pre-calculated hash values stored in databases. It doesn't stop the attacker just from recalculating the hashes So the attacker in this case can still do an attack which is successful in What we said our seven days the time to calculate 94 to the power of eight Possible hash values But they can't do it in a matter of minutes because they don't have all the values stored There are some other things about storing passwords the recommended ways The basic way to store passwords is summarized here, but there are other modifications to storing which we will not get into But basically store the id a random salt and hash the password combined with the salt What should the salt be random? It's generated by the system when the user creates their account The user doesn't need to know it 32 bits or longer is usually recommended some will see a much longer What hash function to use You want a slow hash function Such that Having it slow means the attacker will take a long time to make many attempts If it's a very very fast function that then they can do more attempts per second And there are hash functions which are which are designed to be adaptive in speed That is you can add a parameter to slow it down The work factor so they are recommended to be used some are listed there and another important thing and what we assumed all all along is that When you're designing your password storage scheme assume someone will get your database All right, you want to protect your database so that no one can access it but you should design your storage such that Well, what happens in the future if someone does get my database? That's why we don't store them in the clear Assume that something will go wrong in the future And someone will get access to your database So you want to make it such that It's still hard for the attacker to find the passwords Designed for failure We'll look at an example just briefly Here's my one of my virtual machines which has a set of users. Can you see? Not quite Okay Let's see the users on here I may have to zoom out Remember the password file is that stores user information This one has a set of users already on this computer. So this is the password database Well, not exactly because it doesn't store the password information This is the user database username or the id some user information This x field here actually means there's another file that contains the password information And the other file is the shadow file And that's the first security mechanism Provide access control that the users who are not allowed to cannot access this file But what happens if we make a mistake or some user that gets permissions can access the file And since I have permissions I can look at it Now we can see this is the actual password database looks confusing Let's just grab one line and look at it In detail There's one line. So each user has This information let's just look at the structure of this The fields the first set of fields are separated by the colon character here. So we have A username. There's the user id We have this long field here Which we'll look at in depth. That's actually the hash value plus some other information Then these numbers at the end are something about when was the last password change And about some information about when The next password change should be so the system can store Some information about how many days or how many minutes before you need to Where you prompted for a new password some systems will require you to change the password We'll ignore those numbers for now. Just look at There's the username and then this information It's actually split into three sub fields separated by the dollar signs It's hard to see but there's this number six is one field This value is the second field And this is the hash value this long one of those three fields the number six indicates the hash algorithm used Our system can choose from different hash algorithms Six and you can look up the data six refers to using char the hash algorithm called char with a 512 bit hash Value so our system actually stores that this user we use the algorithm char 512. That's what six means there What's the next value? What's this? That's the salt that's some random number that my Operating system chose when this user created their account And it stored here the format that it's stored. It's just encoded in some modified ascii form It's not actually ascii. It's slightly different, but just think of it as a binary value But of course to show it on the screen and save it in a text file We we have a set of characters. So that's the salt And this is the hash value. So again a random Looking value which was obtained by taking the user's password Combining it with this salt and hashing it with char 500 512. That's the algorithm And we store just this so that now when someone can access this database Their challenge is to take this hash value and go backwards and find the password And that takes a lot of effort with a 512 bit hash value it takes two to the power of 512 operations We calculated Assuming that they have a good password. We calculated that two to the power of 128 would take 10 to the power of 21 years So a brute force attack is not going to work here, but of course the attacker can try Passwords of the length that they think this the user chose. Okay, eight character passwords for example So that's just an example on a linux system the the storage of the password database It's not a sql database. It's just a plain text file, but that stores the user id The salt in it also stores the algorithm when you can have multiple algorithms You should store which one and the hash value Any questions before we finish on password storage? Try and look on your computer and find the the database that stores the passwords whether it's A mac or windows. There's some storage of the password information