 Last week, we've been going through passwords and we finished with some discussion of how to store passwords. We looked at Linux, the Linux operating system as an example, but any system, any operating system, in particular websites, any system that has a login, then the normal approach is that that system stores some user identifier, like the username, and a corresponding password, so that when you log in, you submit your username and password and if they match the stored value, you are authenticated. We want to consider some issues with how to store that password so that no one can get access to it. I've written up some discussion which goes through an example which will go through here. Last week, we got to the point of, okay, if we store the password, then what can an attacker do if they get access to that stored database of passwords? So we'll summarize that and the example is a little bit different than what we started last week because I've tried to use some more realistic numbers. So I sent an email yesterday described, so everything that I'm going to go through today is described on this website, so you can see this and read about it. I'll just pull some numbers and examples from that. We'll look at different cases. So the idea is that the simplest approach on the system, and let's say it's either the Linux operating system or maybe a website that you're developing, you create a website and you've got many users, though users register, they select a username and password and you as the programmer, first approach, you store in a database on your system the username and the password they chose. So that's stored on the system. When the user wants to access the system, they say via web page, they submit their username, they submit their password, your code, for example some PHP code, checks the submitted value against the stored value in this database. If they match, everything's okay. The user is allowed in. If they don't match, they reject it or have to try again. So this is an example of the storage of the user information. Username and password is the basic information. Now this works for authentication, but there are a number of problems if we store this information. First problem, simple problem. Let's say you develop a website and you create this database, you've got a thousand different users. There are several administrators for your website, so you and a few of your friends have set up a company to create this website. All of you, if you can see this database, can immediately see all the user's actual passwords. Sometimes that can be a problem and that you've now discovered some user's password and most likely those users use that same password for other systems as well. So it's very easy for someone who can read the database to see another user's password. In fact, since you're the administrator of the system, you're the one who created the website, there's not much to stop that. There's not much that we can do to stop you, the person who creates the website, from viewing the user's passwords. A bigger problem arises if, assuming we trust you who created the website, a bigger problem is if this database, which is on your web server, gets released to people in the public. Now to happen, for example, there's some other security floor in your server and someone hacks into your server. They break in using some other security floor and they get access to this database. If that malicious user can get access to this database, now that malicious user knows all the user names and the corresponding passwords. So the malicious user can log in to your system using someone else's username and password. But also, potentially, if these users use this similar identifying information and the same password on different systems, for example, you use the same password on Hotmail and on Gmail and on your bank account, if the malicious user can now guess your user name on those accounts, very easy for email, then they also know your password for those accounts. So that's a big problem. If the database becomes public, storing the actual password is bad because a malicious user can immediately find that actual password. The thing is, or the challenge then is how do we stop this? And we said last week we store a hash of the password. So here's a case. Instead of storing the user's password, we store their username and a hash of their password. So now when that user logs in, when John logs in, he submits his username and his password, which was in the previous one, MySecret, and your system now takes a hash of the submitted password and compares it to this value, 0, 6, so on. But if the submitted password is the same as the registered password, then the hash values should be the same because if the two inputs to the hash function are the same, then the two hash values will be the same. Our properties of the hash function should be that they'll produce the same hash values if the inputs are the same, and we normally assume that two different inputs will produce different hash values. So by storing the hash of the password, we can still authenticate the user. We can still check if they've submitted their correct password. And we've got this additional advantage that if someone, if a malicious user gets access to this database, they cannot immediately see the password. So that's an important feature of how we store passwords. To prevent someone seeing the password, store a hash of the password. So that's what we got to last week. What does the malicious user do to find John's password? Given this information, how do you, if you're a malicious user, try and find John's password using this database? What are you going to do? You're too smart. Explain to the other students what a rainbow table, how are you going to use it? Yes, use a rainbow table, but more generally, what's the approach? You don't need a rainbow table. Okay, and where does this public database come from? How is that created? Okay, but you download some public database, but who created this public database and how? Take some words or some potential passwords and calculate the hash and store them. Even better, take all possible passwords, calculate the hash and store them. So before we get to the approach of the rainbow tables and that approach, the simplest approach, although we'll see the performance is not good, but the simplest approach, if I know the hash value and I want to know the password, then we do a brute force attack. And a brute force attack would be to take all possible passwords, hash them and compare the resulting hash value against the hash value we've got, which is what we're looking for. That is, assuming the passwords are eight characters. They were in this case, but to keep things simple, let's assume all the passwords are eight characters in length. Then what we do is, as the attacker, choose an eight character password, a random password, calculate the hash of that, compare the calculated hash against this one. Doesn't match. If so, then we've found the password for John. If not, try a different password, a different eight characters and just keep going. How many attempts do we have to make? It depends upon the length of the password and the number of possible characters in the password. So, as an example, and the example that was in this case, if we have eight characters or the password is eight characters long, all passwords by the users are eight characters, and each of those characters are the printable characters. So uppercase, lowercase, numbers, punctuation characters, then I think if you don't count the space, there are 94 possible characters, 95 if you count space. Therefore, the total number of passwords, and we did this last week, is 94 to the power of eight. So a brute force attack would be to take all of these values, 94 to the power of eight possible values, take the first value, calculate the hash, compare to this if it matches. Good, we've found the password. If not, keep trying. Worst case, we make 94 to the power of eight attempts. Assuming the user chooses a random password, then the average case is half of that. 94 to the power of eight is about 6 by 10 to the power of 15. The problem with that approach for the attacker is that it may take too long, and the limitation of how long it takes is primarily the hash function. What we do is we take a value, a random value, easy, fast to generate, calculate the hash of that. Hash functions are genuinely slow compared to the other operations, and compare the hash to this. So the comparison is fast, it's the hash function that is slow. How fast can computers calculate hash functions depends upon the hardware. An example at a rate of 10 to the power of nine hashes per second, depends upon the hardware, the software, and even the hash function. So this is an example value. In fact, is it 10 to the power of nine or 10 to the power of 10? 10 to the power of 10 I used or will use. So let's say I have a computer, just a PC, a good graphics card because graphics cards can do this processing much faster than a CPU, then if I can calculate at 10 billion hashes per second, the worst case I need to do is 6 by 10 to the 15 hashes. So 6 by 10 to the 15 divided by 10 by 10 to the power of 10 is 6 by 10 to the 5 seconds. So 600,000 seconds approximately. It's about seven days. You can check the calculations. That is if I run this software to try all possible passwords, the worst case I'll take is seven days to find the password, and I've found John's password. Of course I could do it for multiple users at the same time and actually get the passwords of not just one but many users within that seven days. So there's one approach. Don't need rainbow tables, just a brute force attack and you obtain the passwords in this example in seven days. The next thing is okay, can we make it faster? What if I want to find someone's password in an hour, less than a day? Well different approaches. Use more computers. Use more powerful computers. Instead of just using one computer, get a collection of computers and you reduce. So if you've got 10 computers then we reduce down to one-tenth of that time. So that's a simple approach. Of course increases the cost, assumes you have access to those computers. Another approach is once someone has calculated these hash values, there's 6 by 10 to the power of 15 different values, they store them in a database. They store the password and the corresponding hash, the first password and the corresponding hash values. So as they calculate them it takes them seven days and at the end they have 94 to the power of 8 passwords and corresponding hash values stored in a database. It takes them seven days to create this database. But once it's created then other people can use that same database because if you have a different hash value then you just look up that database. And generally looking up in a database is much faster than calculating hash function. So given this database that it's quite easy if you have John's hash value you simply look in this column until you find the matching value then you've found the password. So that's the next approach. Get someone else to calculate the hash values and then just perform a lookup on their database. The problem with this is that we need to store a lot of information and we calculated that last week. Every password and every hash value it takes up, so the hash value is 128 bits, password is 8 bytes, so 24 bytes in total for every row, 94 to the power of 8 times by 24 bytes is about, and I calculated it before, 176, correct, let's find the right number, 146,000 terabytes if we want to store this database in an uncompressed form. So that's not feasible. If we did have it we'd be much faster, we could cut down from 7 days to much faster, we'll give an example shortly, but storing it in its raw form is too big. And that's where we use either some form of compression or even better some specialized data structure that can store this information in a much smaller space. And that's what rainbow tables do. Think of them as a data structure rather than storing the password and the hash as is, a data structure that we can store a smaller representation of them, effectively the same information but compressed, and that compression is designed to be very efficient when we're storing passwords and hashes. We're not going to explain how they work, but I looked up an example, a corresponding rainbow table for this information, this 146 terabytes was stored in 576 gigabytes. That is, using this special data structure, instead of needing 146,000 terabytes, you need just half a terabyte. And now, what the malicious user does, once someone has generated that, and it still takes them the 7 days to generate that table, but once you have the table and you have someone's hash value, you perform a lookup. See, try and find this hash value in this table and once you find the value you've found the password. How long does that take? Some examples in the order of minutes, it depends. You can go to different websites and pay some money to get such tables. So this is a real one, you pay, I think the cost of this was $1,250 USD, they will send you a hard disk, it contains this 576 gigabytes, then you use their software that they also give you, and you supply a hash value and it returns the corresponding password. And some example values that they reported were in the order of 5 to 30 minutes to find the password. So now, once someone has generated this data, to find the password, cut down from 7 days to minutes. So now that's much easier for the malicious user, much faster. That's where we got to last week, how do we fix that or how do we address that? Well, make the password longer, one way. If we make the password instead of 8 characters, so all of these were 8 characters, some were not random, but if this was 8 random characters, if we made them 9 characters, one more character in every password, it would become 94 to the power of 9 that you need to attempt. So multiply this by 94 or about 100, a little bit less than 100. So the time to generate this table would be multiplied by a factor of 100. So instead of 7 days to create this table, it would take the malicious user almost about 2 years to create the table. So just by increasing the password length by one character, we give it a significant improvement in security. And of course, multiply this by 100. So it's not just the time to generate the table, but it's also the size of the table. So instead of half a terabyte, 50 terabytes. Again, it's becoming much more expensive to store that. So increase the password length is one way. But users don't like to have to remember long passwords. So another approach is to add some random characters to the password before you store it. Those random characters are called assault. Here's an example. When each user registers their username and password, our system chooses a random value to combine with the password. It's called assault. So when John registered, John chose the password in the previous table, John chose the password MySecret. The system chose a random salt. In this case, I chose five random characters from the printable characters. And instead of storing a hash of John's password, we concatenate the password with a salt and take a hash of that. So the result is this value here is the hash of John's password and the salt value. I'll demonstrate that. So we can calculate the hash of a particular string. Make sure we don't have new lines. John's was MySecret. And MD5 some calculates a hash. So the hash of just MySecret is this 0, 6, C, 2, that's the hash value. Sorry, it's a bit big now. If we attach some random characters to this password before we hash it, and in our case, A4 H star 1, then of course we get a different hash value and we store that hash value, not the password being hashed but the password concatenated with the salt value. How does that help us? Well, we can still authenticate our system in our password and the database stores the username, the hash value, and the salt, okay? So now when John logs in, what does John submit when he logs in? John submits his password MySecret and what's the system do? What does your system do when you log in? The system takes the submitted password and concatenates the salt in the database. So John submits his password, the system takes that submitted value, concatenates with the salt, it's in the password, it's in the database here, sorry. Takes the hash of that, we get some value. If it matches this value, everything's okay. If it doesn't match, failure. So that's effectively the same operations as before, except we just concatenate the salt value. So we can authenticate the same. What about a brute force attack? What does an attacker have to do in this case? Leave that. What does the attacker do now? In a brute force attack, not with rainbow tables, but a brute force attack. Okay, so an attacker does the same as before. We've got 94 to the power of eight possible passwords. We take one. The attacker knows the salt value because it's stored in this database here. And we're assuming the attacker has got access to this database. They know the hash value, they know the salt. So the attacker takes the password, concatenates the salt, calculates the hash, compared to the stored hash, if the same. Okay, if not, move on to the next password. And it takes them the same time as before, because they need to try 94 to the power of eight passwords, taking about seven days. So a brute force attack still takes seven days. The salt doesn't help in that case. But what about a rainbow table? Remember, our rainbow table stored the password and the hash value. Now, if we want to use a rainbow table, we would need a rainbow table for each possible hash value. For example, we'd create a rainbow table with 94 to the power of eight passwords. We'd take that password and hash it. We'd take the password and concatenate with the salt value, and then hash and store those hash values. So for each possible salt, we create a rainbow table, same size as before, 576 gigabytes. But we need to do it for every possible salt, if this is going to work in general. Because the malicious user doesn't know what the salt is in advance. And you see the salts are different for each user. So if we have a rainbow table for this salt value, we can find John's password. But if we want this to work in general, if we don't know the salt in advance, then we'd need to calculate a rainbow table for every possible salt and store them so that if we have Daniel's hash, then we look up the rainbow table for the salt value five, less than AS4, and then we find the password. So to use rainbow tables in this case, we must have a rainbow table for every possible salt value. How many possible salt values in this case? 94 to the power of 1, 2, 3, 4, 5. Because in this case, the salt is chosen from 94 possible characters and it's five characters in length. Which is about, I've done it before, about four billion. By adding this five-character salt, we have about four billion possible values. Therefore, if we want to use rainbow tables as the malicious user, we need to pre-calculate four billion different tables. So we've just multiplied this half a terabyte by four billion. So now we need two billion terabytes of storage. So that's impossible. And of course, we need to generate them. It takes seven days to generate one rainbow table, multiplied by four billion. That's the time it takes our attacker or malicious user to generate all those rainbow tables. Not possible even with supercomputers. So by introducing the salt, this random set of characters which we concatenate with the password before we hash, we don't prevent a brute force attack. Nothing changes from that perspective. We can still find the password in seven days. But we do prevent rainbow tables from being used. So if we allow rainbow tables to be used, then we can find the password in a short amount of time. Introducing a salt prevents such attacks. So in summary, the way to store passwords to the recommended way is to take the password, a random salt, and hash them combined together, store the hash of the salted password. You still store the salt because you need it when you check the login. Any other benefits of the salt? What's John's password? If we go back to the original passwords, we have a set of users and the original passwords. By chance, John and Daniel chose the same password. That's possible. A set of users use the same password. If we just store the hash, we'll store the same hash value. Again, a very minor benefit or something that can happen here is that if Daniel sees that John has the same hash value, then Daniel knows John's password. Because if the same hash value, they must be the same passwords. But if we use a salt, Daniel sees his hash value. He sees everyone else's hash values. He does not know that John has the same password because there's a different salt here. So that's a very minor benefit of using a salt. But if two people have the same password, different hash values will be stored if you use a salt. The salt, in this case, we assume is seen by whoever has access to this database. Now, this database should be protected. No one but the people who need it should have access, the administrators. But sometimes we cannot protect it. That is, sometimes if there's other flaws in the system, someone may get access to that database. And it's a big problem if they do and if we don't store the passwords appropriately. If we store just the password and someone did get access to this, then we've immediately lost the passwords. But if all goes well, no one can read this database who doesn't have permissions to do so. So all of this analysis is under the assumption that someone malicious can get access to this database. So that's a summary of, or that's the reason why with passwords you store not the actual password, but a hash of the password, and that you normally should also use a salted password. Any questions or issues of how that works? Summary, so the general recommendation when you're storing user login information if you're creating a website, some application where users have to log into, then always store a hash of a salted password. So a salted password is a password concatenated with a salt. In my example, the salt was five random characters, which was equivalent to about a 32-bit random number. So you can make it longer. So there's not much of a problem of making a larger random salt. The user doesn't have to remember it. It's generated by the system, stored by the system. It doesn't take up much space. So a 64-bit salt or longer makes rainbow tables impossible. Never store the actual password because if someone does get access to the database, they will immediately find the password and avoid using unsalted password hashes. So at least take a hash of the password, preferably a salted password. Of course, we've having addressed other issues like making sure a user doesn't choose a password which is easily guessed, dictionary passwords and so on. So the other thing is to make sure your users understand how to select secure and convenient passwords. So to consider the trade-offs between, for example, length required, set of characters allowed and so on. There's no one best solution, but it depends upon the security requirements, who your users are and how they use the system. And that finishes on passwords. On the slides, there's an example in using Linux, but it's the same approach. The password database is stored in some text file, but the same concept, assault is also stored. Let's have a quick look. So in my system, for example, I created a user called John and we also last week, Sandy. Their user information is stored in one file, the ETC password file. In another file, the shadow file, there's the actual hash value. So on a Linux system in the shadow file, the hash for the user's password is stored, or the soldered hash. So here's the username, then we have this long string, this first dollar all the way along goes through to this point here, the slash is the last character here. In fact, it's separated into three parts, separated by the dollar signs. Six is the algorithm used for the hash. In this case, char one, I think it is, and maybe char five, one, 12. It's a number indicating which hash algorithm to use. The next value between the dollar signs is the salt value, the actual salt. So that was generated by the system. And then from this dollar sign to the end, this slash, that's the hash value. Zero, one, u, seven, up to p slash is the hash of Sandy's password concatenated with zero or o, w, z and so on. The password concatenated with a salt and then the hash using char five, one, 12. The resulting value is stored here. So when Sandy logs in, she submits her password. The system concatenates her password, that was submitted with this salt value, takes a hash using char five, one, 12, gets a hash value compared to this. If they match, she's allowed to access the system if not have to retry or cannot access. So the same concept. This is just the password database. Okay, everyone can answer questions in the quiz on passwords. We've gone through online and offline attacks. Entropy last week and the storage of passwords last week and today are the main concepts we've gone through. There are many other issues about how to select good passwords, but we'll stop there. That's the structure in the shadow file that I showed you. Username, algorithm, salt, hashed password, where we could represent the hash using hash function of the password concatenated with a salt. If the salt was secret, so something we haven't mentioned, if the salt was secret, then it effectively increases the strength of the password. It makes the passwords longer. But keeping the salt secret is difficult. You could encrypt the salt, but you need a key to encrypt the salt with and you need to store that key somewhere. So normally the salt is stored in the password database. Someone who has access to that database has access to the salt. What we want to try and do today and partially next week is finish this topic and I think it'll be our last topic, transport level security. This last topic is about what protocols we can use or common protocols available for securing our internet communications. So between applications mainly. So we have one client application which wants to communicate with a server application. How do we make sure the data sent across the internet is secure? And we use two examples here, common examples. One at least you use on a regular basis. The other you've used occasionally to demonstrate some secure internet protocols. Transport level security is referring to the transport layer in the five layer stack. Let's first web or even general internet security issues. When you're using the internet now to send data, how do we secure that? And what threats arrive leading us to want to secure that data? First, the original internet protocols, TCP, IP, HTTP, we use them on a daily basis. They were designed with no security in mind. TCP, HTTP, IP, none of them have any encryption features built in. Because when the internet was built, it was, these protocols were built 30 or 40 years ago. The internet wasn't envisaged to be used as such a large system as it is today. It was originally connecting some university, some research organization together and everyone trusted each other. But now we have many different network providers, internet service providers, and we use the internet not just for personal communications but for business communications, and there's a need for security in the internet, especially for web browsing in that there are a number of threats or issues that arise because we don't have security mechanisms in these protocols. So web browsing and other internet applications, if we want to make sure no one can see our data and to be able to authenticate the people we're communicating with, then we need some extra security mechanisms. The main issues are at the client and the server, for example, authenticating the client and the server. If you go back to one of our first lectures, there's a cartoon, the common cartoon is that on the internet nobody knows you're a dog or if you're a dog, because when you're communicating with someone, you cannot see them. So the network protocol is just using IP addresses so there's no way to confirm who is using that IP address to send you packets. So we need some form of authentication to make sure that we're communicating with the person that we think we're communicating with and the traffic, the information we send between client and server. In a number of cases, we want to make sure that no one else can see that data. The common example is accessing your bank website. You don't want anyone else to see your username and password when you log into your bank website. And most cases you don't want people to see the information being transferred between your browser and the bank web server. For example, your account balance, who you've sent money to and so on. So that data between your web browser and the bank web server is of course sent across the internet. How can we make it confidential? Because the original internet protocols do not have built-in security mechanisms, IP, TCP and so on, there've been some extra mechanisms that have been built to provide specifically security. And the ones we will cover are SSL and SSH. We will not cover IPsec, there's another one. But there are others, but in the internet, the main protocols for encryption and in general security, the abbreviations, acronyms are SSL, also called TLS, we'll define them. SSH, which you've used to log into a remote computer. And another one, maybe a less common is IPsec, security for the IP layer. We'll look at the first two, which are about securing our application data. So what are some of the things that can go wrong in internet communications? Some of the threats, what may happen if those threats become actual attacks and how can we prevent them or stop them from occurring? So we will not go through all of them. Authentication, someone can be masqueraded as another person, impersonate as a real user. So if I pretend to be some other user, that can be a potential threat because there's some misrepresentation of the user. So that the other person receives some information and they believe that information, which is in fact from someone else or it's false, how do we stop that? Well, we know the techniques for preventing such attacks. We've gone through the authentication techniques of hash functions, MAC functions, digital signatures, and symmetric and asymmetric cryptography. What about confidentiality? Again, if we send something across the internet, someone can listen in and get access to those messages. We lose information. Other people gain information that they wouldn't be allowed to access. And we lose privacy. Other people can see what we're doing. How do we stop that? Encryption is the main form. So that someone cannot see the information we're sending. We encrypt that before we send it. And another thing for privacy may be proxies. Proxies are a way that we can be used to hide who is communicating. So we will not go into it, but when you're sending packets across the internet, someone in the middle can see the IP address of the source and destination. And usually you can correlate that IP address to a location, sometimes an individual user. So if you see a packet going across the internet and you see two IP addresses, you can work out that these two people are communicating with each other. And that's some attack on privacy. How do we stop that? Then try to use fake IP addresses. Well, not really fake, but redirect your traffic through other devices, web proxies. So that the person who's monitoring what you're doing sees the traffic coming from a web proxy as opposed to from you. So they cannot see who the actual center of that information is. We may show you an example later of a web proxy, either today or next week. So confidentiality, authentication, integrity, make sure that the data we receive has not been modified. Similar techniques. We can use check sums, hash functions, Mac functions. Denial of service is another major issue in the internet. And we haven't really touched upon that in this course. We have web servers. Companies offer those web servers and make a lot of money from the services they provide. Amazon, as an example. The Amazon web servers, if those web servers are not working correctly, the company loses money because people are buying via those web servers on a regular basis. A denial of service attack may try to stop the Amazon web servers from running. And if they do, then that disrupts the service for the users and eventually leads to a loss of money for a company. How do we stop that? Turns out it's very difficult to prevent. So denial of service attacks, simple ones can be prevented, but complex attacks are very hard to stop. So if we want to have these cryptographic techniques to prevent such threats, we need some protocols to support them. And the three main options in the internet are shown in these three pictures. We either modify or add an extension to IP at the network layer. Remember IP is part of the network layer? When we send anything across the internet, we use IP, the internet protocol. It doesn't have any built-in security, but there's an optional feature called IPsec. It adds encryption for the datagrams that we send across the internet. That's one approach. Another approach is that we add some optional encryption or security features for TCP. Again, TCP doesn't have any security features, but there's some extensions called SSL and TLS, secure sockets layer and transport layer security that effectively add some encryption and authentication features to TCP. And then there's application-specific protocols, protocols which are for specific internet applications for web browsing, for email, for instant messaging, and so on. So different application protocols may have extensions to support security, encryption and authentication. Which one do we choose? Well, they have advantages and disadvantages. If we encrypt or use IPsec to encrypt our data at the network layer, remember in the network layer, everything that we send across the internet goes through the network layer, goes through IP. Doesn't matter if we're using TCP, UDP, HTTP, any application protocol, everything is carried in an IP datagram. By using IPsec, you can make sure everything that is carried from your computer is encrypted and authenticated. So IPsec can be applied to all the information you send and receive from your computer. With SSL or TLS, they're effectively the same, it only works for TCP traffic. Many applications use TCP, web browsing, email, for example, but some do not. Some instant messaging voice applications use UDP. So SSL only applies to TCP applications. So if you want something for UDP, that's of no use for you. Alternatively, for a specific application, you can do encryption. So you can use your email client to encrypt the email message before you send it. PGP is an example. So your email client, you type in your email message, press encrypt, and it encrypts the message, and then that message is sent using normal protocols across the internet, and the receiver, if they wanna read that email, they need to decrypt that email. So that's application-specific encryption. It applies just to one type of application. If you encrypt your email, of course it doesn't encrypt your web traffic. Normally IPsec is implemented in the operating system, so that means your OS needs to support it. Most current operating systems on the desktop do. Some smaller operating systems or lightweight operating systems, for example, on mobile phones may not, because it may not be implemented inside the OS, say, for a particular mobile phone OS, because of the complexity of implementing and of running it. SSL and TLS usually implemented in applications or libraries and you know the main one or one of the main libraries that support SSL, OpenSSL. You've used the command line version, but it's in fact a library that you can write an application and use the features of OpenSSL to encrypt your data to sign data and so on. So that's common for SSL. And application-specific security is implemented in a single application. Your email client, Thunderbird, Microsoft Outlook may implement encryption of emails. Of course it doesn't apply to your web browser. Whereas with SSL, it can be used to encrypt anything that uses TCP, both your web browser and your email. With IPsec, it can be used to encrypt anything that your computer sends across the internet. Voice, instant messages, email, web browsing, whatever. We're gonna focus on the transport layer, just limited time, it's maybe the most commonly used form in the internet today, so we'll focus on that. First looking at TLS and SSL, then a special case, HTTPS, which is related, and then an alternative, secure shell. In fact, it's application-specific, but it uses some of the transport layer mechanisms as well. Let's just introduce what they are. Secure sockets layer, SSL. It was created by Netscape. So Netscape created one of the most popular web browsers years ago, and to ensure that the traffic sent by the web browser to a web server was secure, they implemented SSL, and it was renamed at some stage to be TLS, transport layer security. It was created by IETF as a standard, and SSL version three and TLS are effectively the same. So nowadays, when you hear of SSL and TLS, in most cases, they're the same. So it's maybe more common to see SSL as the acronym being used, but they mean them the same in our course at least. SSL provides security services, encryption, authentication, data integrity to the application layer. So any protocol that uses TCP can also use SSL, and in fact, it consists of a range or a set of different protocols. Before we go through that slide, just a reminder, our typical five layer stack where at the bottom two layers we have our hardware, our data link layer and physical layer, the core part of our computer attached to the internet is IP, the network layer, and we're focusing on TCP as the transport layer and we have some application protocol. So for example, if this is HTTP, your web browser on your computer, what you do when you click on a link in your web browser, the web browser generates a HTTP get request, sends it to TCP, TCP puts it into a segment, so attaches a header, delivers the IP, adds the IP header, eventually sent across the internet, and eventually received by the server which receives and processes it. What we want to make sure is that one thing sent between your computer and the server computer, no one can read the contents, that would be one goal. And also authentication, that we can make sure we know who we're communicating with. So what SSL does is inserts some extra functionality between the application and TCP. And that's what's shown here, it doesn't show the application at the top. So the top two layers in this diagram go in here. Normally your application creates a message and sends it to TCP. Now your application creates a message and sends it eventually to the SSL record protocol. But before it does that, we need some steps to set up the secure connection. So there are some different features for that. You're all good programmers. How does your application send something to TCP in say the C programming language? You're writing an application, your own application. You want to send some data using TCP. What C function may you use there? Write, okay? You've done sockets program. When your application calls, and you've seen this in the lab, when your application calls the write function, it is used to send data to TCP. In fact, before you call the write function, you'd perform a connect to connect to the server. So some setup steps. So when we use SSL, instead of using the write function, there are some other functions available that allow you to send data, but now to send it securely. And I can't remember what the names of the functions are, but for example, let's say it's SSL write. Your application now instead of calling write, calls some other function, SSL write, which takes some data, and instead of sending it to TCP, sends it to SSL. And SSL does the encryption for your application, and then delivers the TCP, which will send across the internet. Where do these functions come from? Well, usually you don't have to implement them yourself. There's a library. One of the most common libraries is provided by OpenSSL. That's the primary purpose of OpenSSL, not as a command line program, but as a library so that you can write applications that can take advantage of the SSL security features. So OpenSSL provides some C functions that allow you to program and use, send your data securely. But before you can send data, you need to set up some connection to the server, to the other endpoint. So in fact, SSL has a number of four different protocols. The record protocol is used for encrypting messages and providing integrity of messages. For example, it encrypts the data and adds some hash or some MAC to the end so that when we receive that data, we can decrypt and check if it's been modified or not. So that's the role of the record protocol. All the data goes via the record protocol. But before we encrypt the data, which encryption algorithm do we use? Which MAC function do we use? Well, we need to negotiate between the client and server which functions to use. So there's a handshake protocol. Before we encrypt anything, we need to perform some handshake between the client and the server. So we authenticate the other side, check that the other entity we're communicating with is who they say they are, and negotiate parameters, for example, keys to be used, algorithms to be used. It turns out that you can use a particular encryption function and during the connection, you can change it. You can change the cipher or algorithm. So there's a special mechanism to change the cipher to use, change the algorithm. And if something goes wrong, there's a mechanism to send alert messages. Maybe you receive a data packet which doesn't decrypt successfully or fails the authentication. So there's an alert protocol to send the other side a message saying there's some warning or error. We will not touch upon the alert much, briefly on the cipher, but mainly the handshake and then how to use the record protocol. Let's just see what we can finish on. We'll talk about connections and sessions later. The basic approach for our record protocol, when we have a connection from client to server, once we've set up the connection, we take our data. So application, create some data, call some function to send that data. And then SSL goes to work and what it does is shown in this diagram. It takes the data, it divides it into fragments and it optionally compresses each fragment. So we don't have to have compression but we can compress and make the fragment smaller. We use a MAC, a message authentication code to provide authentication and integrity. So we take this compressed value, calculate a MAC. What does a MAC take as an input? Remember, a MAC takes some data and a key as input. A hash function takes just data as an input, a MAC takes data and a key. The data is the compressed fragment here. The key would be negotiated in the handshake and we attach the MAC value to the end. We take all of that and encrypt using some negotiated encryption algorithm and some key, so some symmetric key encryption algorithm. So we get some cipher text. We attached an SSL record header to the start and then we send that to TCP. So all of this is happening between the application and TCP. So if we see the next layer here, we'd have TCP here. TCP receives this and TCP follows its normal steps of creating a TCP segment, send to IP, IP sends it, and it goes across the internet. When it's received at the server, eventually it will come up and be sent to the SSL record protocol and we'd do the opposite steps. We'd remove the header, decrypt, check the MAC, make sure there's nothing wrong with what's received, decompress, we'd get the fragment back, we'd do that for each fragment we receive and eventually get the application data back and then pass that application data up to the application using a read, for example. Whenever the application reads, it would receive that application data. We will go through after the break the way to set up a connection, how to negotiate parameters, and then go through an example from a capture of how that happens in practice. Let's have a break now and continue at 2.40.