 HTTPS is in fact just using normal HTTP, but it uses another protocol to transfer the HTTP packets. The other one is called SSL or TLS. SSL stands for Secure Sockets Layer, TLS Transport Layer Security, and we'll see a bit more about them, but two different protocols in fact, but some versions are almost identical. So remember HTTPS is simply HTTP using some other transport protocol. From a user's perspective, to trigger your browser to use HTTPS, you need to specify the extra s in the scheme of that URL. Port number used by a web server by default is port number 443. So if you set up a web server that supports HTTPS, normally that web server will support two modes, normal HTTP and secure HTTPS, and it can accept transfer on both modes, and it does that by using two different port numbers, port 80 for normal, 443 for secure. And when it's used correctly, everything sent in the HTTP get request and the HTTP response is encrypted. So the entire URL that you request is encrypted. The contents, the document that comes back, the web page is encrypted. Any values submitted in browser forms, like my username and password, are filled in on the form and press submit, those values will be encrypted because they are part of the HTTP messages. Cookies. What are cookies used for? Some people who arrive late, they're challenged to answer some questions. What are cookies used for? To remember some histories so that the server and the client can have a keep track of what's happened in the past, what pages you've accessed. So they can reveal confidential information to someone who intercepts them, and they can be used to log in as someone else. So it's important that they are hard to obtain, so we encrypt them as well. So the entire HTTP messages are encrypted. So that's one aspect, the confidentiality aspect of HTTPS. The other security aspect is that with HTTPS, we want to know that the server we're communicating with is the correct one. It's not someone pretending to be the server we think we're accessing. And the way that that's achieved is using SSL and certificates. And from the other perspective, the server wants to know who the client is. And that is using the normal technique of passwords. So you submit your username and password to verify that you are the real client or you are the person you claim to be. So a little bit about SSL and TLS. It originated from the old, one of the older or first web browsers, Netscape, Secure Sockets layer. And over time, it's gone through different versions and became Transport Layer Security, which is almost the same as SSL, or at least one of the versions of SSL is almost the same as one version of TLS. So the names change sometimes. I will often use the old one SSL, but the newer protocol that continues to be developed is TLS. SSL provides security services to protocols, to application protocols using TCP. It's not just used by HTTP, it's used by other applications as well. And it's quite a complex system, and we need to go through how some of the components work. But first, to place it into perspective of what you know about networking from a layered perspective, we'll compare normal web browsing to the Secure mode. We can see from a layered perspective, normally we use HTTP as the application layer protocol. So with respect to layers, we have application, what's below application layer, starts with TCP, Transport Layer, Good, Transport Layer, Network Layer, and then at the bottom the data link layer and the physical layer, which are often joined together. Normally when we use HTTP, the Transport Layer is what? What Transport Layer do we use? TCP, and we use IP, and underneath, depending upon our hardware device, whether it's Wi-Fi, Ethernet, or some other technology. So we're not list here, not so much of interest. With respect to implementation, where do we find these implemented? Typically the operating system implements the Network and Transport Layer. This is in the OS. Whereas applications implement the application layer protocols. So TCP and IP are part of the operating system. And that's important because as being part of the operating system, normally the end user applications cannot change how they behave. You don't have permissions to do much rather than send and receive data using them. So we can't really modify how they work from application layer. So what a web browsing application does, for example, it may implement HTTP generator request and sends it to the operating system saying, please send this using TCP. Send the data with TCP. With HTTPS, it's similar at the operating system level, and the application, we're still using web browsing. So we still must use HTTP. So our web browser generates a GET request, receives HTTP responses, except before sending them to the operating system, we introduce a new layer here, or a new protocol, and that's where SSL comes in, SSL or TLS. The secure sockets layer is one layer that we insert in between TCP and HTTP to really intercept the HTTP messages and encrypt them and provide other security services. It means we don't have to make changes to the operating system, we simply insert this extra protocol that does all the security features that we need. It makes it easier for the development of applications and updating of systems. So now, whenever you generate a HTTP GET request, your browser creates the GET request. Before sending it to the operating system, SSL will encrypt that. Send the encrypted request to the operating system and then use TCP to send it out across the internet. And as something comes back to your computer, it's received by TCP and sent up, SSL will decrypt and then send the decrypted contents to HTTP software in your browser. In terms of implementation of the top layers, HTTP we can think is implemented by your browser and so is SSL in a number of cases. Or maybe there's a library that does it, that is because SSL can be used by applications other than web browsing, sometimes your computer will have a library that implements it and different applications will call that. What library do you know of that implements SSL? I think you've used it in one of your homeworks. Anyone remember the name? You've used on the command line open SSL to generate some keys and to do some encryption. That's a library that implements SSL and it's used by many applications to provide security. So if you go create your own web browser, you don't need to go and implement SSL, you just load the library and it does it for you. You don't have to worry about implementing the encryption and so on, you just implement your top layer protocol and use the library for the other features. There are other libraries that do that as well. So that's from the layered and the implementation perspective where SSL sits. If you look at different operating systems they'll provide their own libraries. Apple OSX has its own library for secure sockets, Windows has a separate library and there are several different open source libraries. How does it work? Well, the main purpose of SSL is to encrypt the data before it's sent across the network. So HTTP generates a request, SSL will encrypt that and then send it through the operating system and as stuff is received it will decrypt. In SSL it's called the record protocol that does that encryption and decryption. For example, when you generate a HTTP request, the SSL record protocol encrypts it, sends it to TCP and then it's sent out and similar on the reverse direction it decrypts. But to support encryption we need to do a few other things. We need to set up a connection. So there's a handshake protocol that allows the client and server to negotiate or to set up a connection, negotiate parameters, choose the encryption algorithms to use. Sometimes we'll change encryption algorithms during a connection so there's a method for changing the cipher for changing the encryption algorithms. And sometimes there'll be things that go wrong. Maybe we can't decrypt a packet or there's some change of status. So there's an alert protocol that allows us to send messages to the other side saying something's gone wrong or there's some status or warning message. So there are different parts of SSL, quickly look at some of the concepts and then see an example from a capture of how it works. First let's look at encryption, the way that packets are encrypted. I have a HTTP get request. I want to send it from my browser to the server. So that is what we call the application data with respect to SSL. That's the data we want to get to the other application. What the SSL record protocol does is takes the application data, splits it into smaller fragments, usually fixed size fragments, optionally compresses them so we may apply some compression to make them smaller to reduce on communication overhead. Then we do some security operations. The first one is we add a MAC and I briefly mentioned a MAC before. A MAC is used to authenticate what we receive. So when the receiver gets this fragment it uses the MAC to verify it came from the right person and it hasn't been modified along the way. So instead of using hash functions we use a MAC to do this verification. Then the data plus the MAC combined is encrypted. So we apply an encryption algorithm, usually symmetric encryption algorithm. We attach a small header to indicate what fragment this is. And then we send that using TCP. And of course we do the same for the other fragments. If there were three fragments then that would get three encrypted packets which would need to be sent across the internet using TCP. And the receiver does the inverse there. Receives checks the header, decrypts, verifies the MAC, make sure nothing has been modified, decompresses, get the fragment. Once all the fragments are received joins them together and the receiving web server gets the HTTP request and can process it. We'll go to the handshake protocol. So that is for the data transfer. Before we do any data transfer we must agree upon what algorithms we're going to use and a number of other parameters. And that's done in the handshake protocol. And the purpose is to allow the client and server to confirm that talking to the right entity to authenticate each other. And especially in web browsing the browser wants to be sure that who it's communicating is the web server. How do we do that? How do you know when you send a message to the website that it is the real website, not someone pretending to be a particular website? How do you authenticate the web server? You may have seen it happen in some cases. Or you may have seen the outcome of unsuccessful authentications. You may have seen you access a website and your browser points up a warning or an error message saying this web server cannot be authenticated. We do not trust it. Do you really want to continue here? Maybe the registration website for SIT, I don't know if it still does, but at least in the past if you initially accessed that presented a warning. Why? Because your browser could not authenticate the server. It couldn't trust the server. It had no way to confirm it was the real server. And the way that it does that in practice is using certificates. So we'll see the role of them later in this topic. Only for web browsing, we don't use SSL for the server to authenticate the client. We just use usernames and passwords there. But we could use certificates. The handshake protocol also allows us to choose the algorithms to use. How do we encrypt? And SSL doesn't mandate which algorithms to use. It allows you to select. And there are algorithms used for encryption to encrypt our data and it supports many different ones. So the handshake protocol, the client and server agree which ones to use. AES is supported as are others. The MAC for authentication, there are different algorithms. So the handshake protocol, they agree which one are we going to use today? And for obtaining a secret key before I can encrypt with AES, both sides need to have the same secret key. How do we get that? There are key exchange methods and different ways to do it so that in the handshake we can start the key exchange and choose the approach and perform the key exchange. So really there's four steps in setting up a connection. We agree upon what to use. We select the algorithms. We authenticate the server, check that the entity we're communicating is in fact the real server and perform a key exchange. We may optionally authenticate the client, the server checks that it's the real client. And once we've finished the key exchange then we close or we finish setting up the connection and then send data. So there are a number of steps involved there. Rather than talking about the details, we'll look at a specific example to finish this. The examples are from accessing a website and captured using TCP DUM, the packets, and we'll look at them in Wireshark, a little bit easier than using the command line. And I did them yesterday so we're not accessing the website now. This is a set of the packets that have been captured. When I accessed a website yesterday I captured these packets. The IP address of my computer, I was using my laptop, my computer was 192.1681.2. What web server did I connect to? Maybe we can zoom in a little bit more. The number of packets captured here, what can you observe from just the list of packets? Some things you should be able to identify. The addresses, what's the address of the web server I accessed? I accessed a website. What's the IP address? My laptop, it was my desktop on the office was 192.1681.2. What website? What IP address did I access? 103.3.63.107. How do you know? That's an advanced correct, a good assumption. He assumed that my laptop was one connected to some router. Well, not really a router. He notices the 192.1681.1. Maybe that's the router in my LAN, most likely. Well, it is my router, you're correct, but it's also the DNS server. Because what I did when I accessed the website, I typed in a domain name. Then I pressed enter, and you may not have studied in much depth yet, DNS. But what happens when I type in a domain name, my browser sends a request to a special DNS server saying, here's a domain name, tell me the IP address for that computer. And that's what the first four packets are about. This is the DNS query saying, from my computer to a DNS server, making a query saying, here's a domain name, tell me the IP address. So the domain name sandylands.info was what I typed in, and now the DNS server sends back a response. So 192.1681.1 was the DNS server. The response comes here. The answers, here's the answer. This domain name, the IP address is 103.3.63.107. So that is the IP address of the web server. And then I contact the web server via the IP address. Maybe the easier way to see that, if you recognize the HTTP messages, you'll see those addresses in play. So the first four are about DNS. Then we set up a TCP connection, SIN, CINAC, ACC, note the port numbers from my browser 37460 to web server 80. What protocol were we using at the application layer here? TCP of the transport layer, and above that, HTTP or HTTPS? Hands up for HTTP, no S. Two are correct. Everyone else is wrong. Why? Well, one way, port numbers. A HTTP server uses port 80. A HTTPS server uses 443. I'm still using normal HTTP here. Nothing's encrypted. You can see from later that nothing's encrypted. So I set up a normal TCP connection, SIN, CINAC, ACC. Then I send my HTTP request. Then I get a response. And there's a few more. If I filter on HTTP, you see I send a request for the slash directory, the root directory, to the web server. I got a web page in response. There's no need to show you, you believe. There's a web page there, I think. We can show you. So I got the response. Note inside this HTML web page, there are some references to other resources. There's a link to a style sheet, site.css. So in fact, my browser automatically requests that web page, that file in the next request, gets it in the response. And also the favorites icon, which I think you know is that little icon that sits in your tab of your browser or in your bookmarks. This is normal HTTP. Let's look at HTTPS. I did a separate capture accessing the same website but using HTTPS. There's some DNS queries at the start. Again, it says, what is the IP address for sandylands.info in the query? And my DNS server sends back a response saying it is the IP address 103.63.107. So that's the same as before. So now I know to connect to the web server. But my browser knows, because I typed in HTTPS colon slash slash, my browser knows when I send that first TCP packet, don't send it to port 80, send it to port 443. So the TCP SIN is going to port 443. My browser knows if you use HTTPS destination port 443. SIN, SINAC, ACK. So that's still a TCP connection setup. We set up the TCP connection. We don't then send the HTTP data. We now set up the secure connection, which is either SSL or listed here TLS. TLS is the newer version. And what I'll do now is I'll filter out and show the TLS packets. You'll see there's some TCP acts just to see the exchange that takes place. So we set up a TCP connection and then filter on SSL. The protocol is SSL or TLS, specifically version 1.2. We can hide that. And we see this is the secure protocol doing the exchange to set things up. And eventually we'll see the encrypted data. Let's go through this example. First two messages. I send to the server a hello message. My computer's the client. I initiate communication saying hello. I want to create a secure connection to you. And the server will send back a response, a hello message. What's inside those messages? This is part of the handshake protocol. We do a handshake to set things up at the start. Inside, if we expand down, this is what the client says to the server. A number of things inside of use. Let's expand to get the right thing. First thing of use that you may recognize, the set of ciphers that my client supports. And it's got 11 that it supports here, right? So my browser is saying, let's use one of these 11 to do our security connection. And for each set here, you see it's this long string on the top one and it's done in order. So I say this is my first preference and then the next one is my second preference and so on. So the browser will send its preferences for how to do the encryption and other things to the server. The server may choose one and send back what's chosen. We'll see in the server hello. In the highlighted one, what do you recognize? What algorithm is used for symmetric key encryption? AES. How long is the key? So these are typical exam questions. Here's a capture. What encryption algorithm was used? AES. How long was the secret key? Note that if you remember, AES supports different key lengths. AES128, AES256, AES128. So AES is the symmetric key algorithm with 128 bit keys. GCM is what's called the mode of operation. AES operates on a small block of data. If you have a large file that joins them together, you're using a technique called GCM, but we haven't studied that. What hash algorithm is used? SHA256, SHA is a secure hash algorithm that produces a 256-bit hash value. So they were the easy parts. So TLS, AES128 for the symmetric key encryption, SHA256 for the hash, but there's some other things. And so there's two more fields really. This is saying TLS, the first one says what algorithm are we going to use to exchange keys? Because for AES, we need 128-bit key known by the client and server, but we can't send that across the network without it being encrypted or performed in a secure manner. So there's a secure key exchange approach and there are several approaches. The first field here, the column after TLS, gives the algorithm used for key exchange. And we haven't studied them, but the names include Diffie-Hellman, DH, Diffie-Hellman, variations of Diffie-Hellman, Elliptic Curve Diffie-Hellman and RSA, the common ones there. The second field, note this one says EC-Diffie-Hellman, EC-DSA, this is used for the signatures. What algorithm do you recognize is used for signatures? A signature algorithm. In the first one, what's used to sign? So the way to read this, TLS, the next part is the key exchange algorithm. For example, EC-DHE, the next part is the signature, or the public key algorithm for signatures. What do we use here? EC-DSA, DSA is the digital signature algorithm. If you see the one underneath that, RSA, and you may have, you should remember RSA, you've, I think, create RSA public and private keys. RSA is a common algorithm used there. So this one uses RSA, the first one uses DSA and a special mode of DSA called Elliptic Curve DSA. You don't need to remember the names of all of them, but recognize the first parameter, how will we exchange keys? Next one used for signatures or authenticating the server, symmetric key encryption, and then the hash algorithm at the end. So my browser supports many variations. My browser also sends some other things like what compression I support, what algorithm does my browser support for compression? None, it doesn't support any, okay? Null means it doesn't have any, others could be zip or others. Send some random values. Random values are useful for the key exchange and other parts of security exchange, and a few other options. Server sends back a response, hello, and it chooses one of those sets. In this case, the web server chose this particular instance. It was the first one, well, maybe it was the second one in the list. So depending upon what the browser and the server both support, the server will choose one. And that will be used in the next steps. The server sends some other information, especially for the key exchange is included in here. Since we haven't studied the key exchange protocols, we'll not show you those details. The server, this is the third one sent by the server to my browser, sends its certificate. And we will study what that certificate looks like and how it's used in this topic. But it uses a digital signature to confirm that this is the web server, sandylands.info. Inside the certificate, we'll see some of the values that are the domain name. It's a signed certificate and the subject, we'll get there. We see that we have to scroll across. The domain name is listed here. This is who's certificate. And we'll see it's issued by someone. It's issued by another organization and it includes a public key, an RSA public key. So we'll return to the certificate in our lectures and look at what is the format and how it provides authentication of the server. But that's exchange using SSL. The next two messages are to complete the server, the key exchange. Remember to encrypt data, we need a shared secret key. Both sides need it. So there's an algorithm that they use to exchange a secret so that no one else can work out what that secret is. That's happening in these first few messages. Then the last, they finish the handshake, I think at this point. And then from that point on, the data is sent and if we zoom in on that data, it's encrypted. We can't make any sense of the data. It looks like random bytes. So even when we capture it, this application data is encrypted. This particular piece of data, what do you think it is? With respect to our web page, well, HTTP exchange. We know we were using HTTP. What do you think this packet is? It's the first bit of encrypted data. If I decrypted it, what do you think it would be? Who sent it? Source is still 192.1681.2. It's my laptop sending encrypted data to the server. What's the first thing I send? I send, well, normally a HTTP get request. So I get some web page. So most likely this is the get request, but it's encrypted. So we can guess that it is the get request and it's not so hard to work that out, but we don't know what page I requested. What was the specific page on the server that I request? I don't know if I captured this because it's encrypted. The URL is encrypted, or the web page at least. And the subsequent packets, this one is from the server back to me, encrypted application data. And it's a bit longer in this case, but it's all encrypted. This is probably the response, the web page coming back. All encrypted, of course. So the idea is we set up a connection. First, we set up the TCP connection, and then we set up a SSL connection, agreeing on parameters in the hello messages, authenticating the server using a certificate, exchanging a key so then we can do encryption. And there's a couple of other messages to say we're ready to do encryption and the application data is the encrypted data. The subsequent messages are forgetting the site.css, the style sheet. So that's for the requesting the next page. We will stop there, we'll next look at certificates and we'll come back to that example and look at that certificate in detail and see how that's used to verify the server.