 We're now going to look at this topic on web security, security for a specific type of application, web browsing. And this topic, we'll look at a couple of aspects of web security. The next one, we'll look at web security from the perspective of attacks on applications. When someone creates a web application, what types of attacks can take place. And then we'll see some related protocols come up even in the last topic. Here we'll just start on web security. What we need to do in this topic is talk about, well first, remind ourselves what do we mean by web browsing, what protocols are used, how do they work, very simple stuff that I think most of you know. Then we'll look at the two specific techniques for web security, two of many. We'll look in detail about HTTPS when you access a website with HTTPS as opposed to HTTP, what does that mean, and related to that, certificates. How we confirm that we're communicating with the correct server. Web browsing, and we'll go through this quick. Anyone not know what web browsing is? Anyone not know what protocol is used for web browsing? No one put their hand up, so everyone knows the protocol used for web browsing. So we'll go through it quick and just remind you. Here's a simple view of web browsing. We think the human user uses a web browser, a piece of software, and there are different implementations of web browsers, Firefox, Chrome, and so on. And they use that to access websites. So there's another piece of software, a web server running on another computer in the internet, and there are different implementations of web servers. You may not have used them much yet, but Apache is a very common open source web server. Microsoft IIS is a web server, and Jinx, and Engine X, and many other smaller web servers. So there are implementations of web browsers and servers, so they are software. And note when I say a server, do I mean hardware or software? If I say, do you go set up a server, what do I mean? Well, sometimes we mix them. Sometimes it means hardware, sometimes software. Here a web server is just a piece of software. It's just an application that runs on any computer. I run a web server on my laptop, on my latest hardware PC, or you can run it on your mobile phone. It's a piece of software. But often in practice, for popular websites, the web server software would run on hardware specialized for acting as a web server. So sometimes when we say a web server, it also talks about the hardware, right? The Facebook website doesn't run on someone's laptop or someone's office PC. It runs on actually many dedicated pieces of hardware tailored to serving web pages. But here when I say server, I mean software. And we know how web browsing works, and I think we know that the protocol used to communicate between web browser, the client, and web server is HTTP, the Hypertext Transfer Protocol. We send a request for a web page. The web server sends back that web page in a response. A little bit more depth about the protocols. Yes, we use HTTP to send a request to the server and get a response. And it's that simple. It's just one request, one response. If I want another web page, I send another request to get another response. If I want to download the images in a web page, most web pages have images embedded, then I send another request and get another response for each image. So a simple request response. And with HTTP in the normal mode, the next request has no connection to the previous one. We say it's stateless. The browser just says, okay, the user wants this page, sends a request, gets a response. Now it says, the user wants this other page, sends a request, gets a response. That second one, the browser is not related with respect to HTTP of the previous one. We'll see the consequence of that shortly. HTTP is sent across the internet using a transport protocol TCP. So in fact, before we send that HTTP request, we establish a TCP connection from browser computer to web server computer. And we use port numbers to identify those applications. Server port number, what is it? Easiest question today, what's the server port number? Not one, port number, 80. Remember it, it's a web server. Server port number, it listens on port 80. What port number does your browser use? It's not 80, something else, it's unpredictable. It's chosen by your operating system when your browser tries to connect to a server, so it will be dynamically assigned. Today, it may be 50,146. The next five minutes, you may have a different number. So that changes, we don't know. But the server port will normally always be port 80. We'll see some exceptions come up. Of course, TCP packets are sent using IP across the internet. So the other things of interest to us, the IP address of the, I say the browser, or the browser's computer. And the IP address of the server, or the web server's computer. So they are used to identify our communications. We will not go through the details of the HTTP message format. So a few slides give the details here. We'll see them through examples a little bit better. It's a request response protocol. One request has no relationship to the previous request, so we say it's stateless. The browser, another name for the browser, is a user agent. So we'll see the technical name is a user agent. The browser or client sends the request, the server responds with a response. Port number 80, what does a message look like? HTTP messages are just text messages. So they're written in plain text, and they have some format. And the generic format, there's what's called a start line. And there's a different common start line for a request and response. There are optional header lines. And a header simply contains a name of a field, a colon, and then the value. Then an empty line and maybe the data or the message body. And we'll see that come up in examples. Well, here's the first example. Step one, you click on a link on your web page. That happens at step one. Or you typed in an address in the address bar of your browser and press Enter. What happens then? That triggers eventually your browser to send a request to the web server. And the request will contain many things, but one piece of information in there will normally say, get this web page. This is just an example, get slash test slash index.html. It'll contain other stuff as well. So the get is the type of request it's making. There are other types, get is common. When the web server receives the request, it checks, do I have this web page? It looks on its hard disk. If it's there, then it reads it in and sends back a response. And the response will have some start line saying, we're using this protocol, HTTP version 1.1, and everything's okay. Your request was okay, the response is successful. So there's a status code that is used to indicate everything's okay. The number 200, followed by the word okay. We'll see some other examples. Then maybe some optional fields which I don't show at a blank line. And then the message body. And in this case, the message body is the contents of the file we requested. I requested some index.html file. That file is sent back in the response. When the browser receives the response, it checks the headers and then takes the HTML and shows it on your screen. So that's the quick example of HTTP. Questions, I assume you know this already. So we go through it quickly. You know it from other courses or your general knowledge of using computers. Everyone's created a website before, I'm sure. So this is easy. When you created a website before, what software did you use to create a website? Notepad, okay? So notepad++, if you're advanced, the website in very simple terms is just some text but formatted according to HTML. So you just need a text editor to write those web pages, save them on your server. Assuming that server is accessible by others, they can access it using HTTP. The request messages have some more details. So the start line includes a particular format. There are different types of requests or they're called methods. Get is common, I want to get this resource. Another one we will see is post, which is really the client posting data to the server. Get means that we want to get data from the server. Post means I want to send some data to the server and that will become useful shortly. The response messages, 200 okay, we will see commonly that's normal. This one you may have seen a lot, 404 not found when you try to access a page that doesn't exist, but there are many other response codes. And both requests and responses may contain optional header fields. And there are many, many different header fields. These are some listed here, but we will not explain them until we see them in a detailed example. So web applications, who has developed their own website for a class or for some other purpose? I think the answer is yes, you did it in database systems maybe last semester, correct? You created a website, so you just opened up Notepad or your favorite text editor. You wrote some HTML, saved it on a computer and then it was your websites complete, correct? Did you do more than that? Did you just write HTML? You wrote also some style sheets, CSS maybe, if you designed your page well. So instead of including the style in the HTML, you had some separate CSS files. Still just need a text editor for that, anything else you needed to do? I think you did database systems most of you. Apache was the web server you may have used. You may have used something like, what, my lamp or what's the Windows one, the MySQL and Apache server that installs on Windows. So there's software to install Apache, your MySQL database. And what else does it install? What else did you do to create your website? Yes, PHP. You wrote some code using the programming language PHP that, why would you write PHP? What was it for? The HTML just is used to display the web page. What was the PHP for then? Why do you need it? Note that a simple web page, yes, we can write in HTML just, and that would be a static web page. But I think in your cases, you created dynamic web pages. The content changed depending upon some requests. And also the content came from a database. Your content that you sent back was not stored in a HTML file, it was usually stored in a database. And you use PHP to have this dynamic content and database access. And most web applications today are not just HTML files, but they have some programming language that generates content. We need dynamic content today. Many websites have dynamic content. So it provides interactivity. It provides tailored content based upon who's accessing it. If someone logs in and they see the website, it says, hello, Steve, welcome. If a different person logs in, it will say their name. Same web page, but the content is tailored to the particular user. And to provide interactivity and tailored content, we either use client-side programming languages, which is run on the web browser. JavaScript is common. Others you may find as Flash, Silverlight, or even Java, which is sent from the web server to your browser, and your browser executes the code. It executes the JavaScript. Or, or in combination with server-side processing, where there's a PHP or code in one or many other languages on the server, when you access a particular website, that code executes on the server, generates content for the web page to be sent back. And often, content is stored in databases. So I think you've all had experience with that already. And it looks like this with respect to HTTP. Still, the web browser sends a HTTP request to the web server, requesting some web page. But instead of maybe requesting index.html, it may be requesting index.php. And the web server receives this and realizes, ah, this user wants index.php. I know if anyone requests a PHP file, I need to execute that file. So the web server passes the PHP code to an engine, like the thing that will execute the file, the PHP software. It executes the PHP code. Maybe optionally that PHP code connects to a database, submits a query, sends a query, selects star from this table, and so on, this query to the database. The database executes the query, returns some data, and PHP puts it into a nice format. And then PHP sends the response to the web server. And the web server then takes that response and puts it into the HTTP response to send back to the web browser. So that's the more common approach of server-side processing for dynamic content, which I think you've experienced. And that allows us to have things like tailored content, to have dynamic content. Any questions on server-side processing for dynamic content? In homework, after the firewall homework, you'll need to do some, look at some PHP and maybe do some PHP programming. Is that OK? Not OK? Then maybe time to withdraw. You've still got a chance. So you need to understand, we'll have a couple of examples using PHP, because I think you've all used PHP. So you just need to be able to understand what's happening there. So that's the introduction to web applications. What's the security issues? What can go wrong with respect to security? A number of things can go wrong. When we send data between browser and server, we often want that data to be confidential. The request and the response, especially the response sometimes, may contain information I don't want others to see. Well, because it's sent across the internet, who can see the request and response? Anyone on the path between my computer and the server? Let's say my computer is the laptop. The server is in the US, the website I'm accessing. So the path includes anyone who can access that Wi-Fi access point in the back of the room, or can access the SIT computer network, because it passes through their devices. And then our ISP, so anyone who works for SIT's internet service provider can see my content, and the next ISP, and the next ISP, and so on. So there's many opportunities between the browser and server for an attacker to intercept and see the request and response. If the request is information about my username and password for my bank account, I don't want others to see it. So sometimes we want confidentiality of the communications. How do we do that? We encrypt, and the encryption is provided using HTTPS. So this topic we'll look at, well, what do we mean by HTTPS? How does it work? So yes, we encrypt the data, but we need a couple of other things to make the encryption work, and HTTPS does that for us. Another issue, your browser connects to your bank website, and you send your username and password to the bank website, and are you sure it's the bank website? You send it to the domain name you typed in, is correct, www.bank.com. That's what you know. The packet is sent from your browser out across the internet, and then someone replies, how do I know is the bank web server replying? It's not someone in the internet pretending to be the bank web server. So that's a problem. How do I know that the server is who they say they are? Because if it's someone here intercepts my packet, they pretend to be the bank web server. They have a fake web server running, and send back a response saying, thank you for trying to log in. But currently, the website is currently unavailable. I think, OK, the bank website's not available. But now they've learned my username and password, because I submitted them when I tried to log in. And now they access my real bank account. So we need some way for the client, when they communicate with the server to know for sure this is the real server that I'm trying to access. And that's an important problem to solve. And the approach is to use digital certificates. So this topic will look at what are they and how they solve that problem. There are other problems. How does the server know it's you? Here's the bank website. Someone sends them a request. I am Steve. Show me the bank account details for Steve to the web server of the bank. How does the bank know it's Steve, and it's not someone pretending to be me? Very simple. How does the server know that it's the right person accessing it? Again, ask you to input your personal information, such as username and password, your ID and password. So the way that the server knows it's the right person using the client is usually by login, username, password. When you visit the bank website, how does it know it's Steve visiting the bank website? Because it's just some computer sending packets. Well, it prompts me for a username and password that I've registered. And then if I supply the correct one, it logs me in. But we know how to use usernames and passwords because we've studied the topic in authentication on passwords. So password authentication is very common for the server to be sure you're communicating with the intended user. In your web application you did for database systems, did you have a login? Who had a login? Hands up. Who developed some login page for a website, maybe using PHP, a user logs in? Or did you just have a very boring website and got an F? I think some of you would have had logins where the user has to supply a username and password. Right, and you need some session management. Some in PHP, you can use some PHP session ID so that when the user logs in, then the next time they access the website, the server knows it's the same person. They don't need to log in for every access. So there's some idea of session management. The reason you need that is because HTTP is stateless, meaning you send the first request, log in, and the response comes back. The next request from that same browser to the server, the server doesn't connect that next request to the previous one. It doesn't maintain any state there. You implemented some state management or session management using the PHP features like session ID, I can't remember the details. So you know how to do password authentication and session management for web pages. Of course, that should be secure in that the password information should be securely transferred between browser and server. HTTPS is important. And of course, the password should be stored correctly in the database, hash of the soldered password. What else can go wrong? When you set up the web server, someone accesses your website. If they could get the web server to do something unexpected, one thing maybe, normally your web server only sends back a certain set of web pages. So files in this directory can be served to browsers. All other files on your hard disk are not accessible to web browsers. So if an attacker can somehow get your web server to maybe read the password file on your disk and download that, then they can learn confidential information. So we need access control with respect to this engine, especially in the web server, as to what they can access on this computer. We were not looking to detail of how to set that web server up securely. We'll assume it works OK. If we get time at the end of the semester, we'll look at this last issue of not just keeping your data confidential, but keeping your actions confidential, privacy. When you access many websites, first the web server knows when you access, what pages you accessed. And maybe if they can collect data across many websites, they can get some trends or some behavior of what you're doing. Not just the web servers can observe that, but people in the internet can observe what websites you're accessing. Sometimes you want to keep that private. So we'll talk about how to be anonymous in the internet. So this topic is about how does HTTPS work and how do digital certificates work, and they're related. The next topic would look more about attacks on web applications. So we need to look at how do we communicate confidentially between browser and server? We do so using HTTPS.