 So we know ways to encrypt our data, so someone cannot see the data, keep our data confidential, but we want to look at some other aspects of security, and this is often termed privacy, is to keep the patterns of communications confidential. Who is communicating, when, and how often? So this set of slides talks about several options for providing privacy in the Internet. Note that the slides were developed for something other than this course originally, but I've used them here, so there's some background slides here that you all know, we'll skip over them, so they provide too much background. You all should know. So some of them will skip over that, just some acronyms that we'll use throughout, so just look them up as you need. What is the Internet? Well, you know, but for our examples, when we try to explain the option, we'll have a model of the Internet, here's a simple model of the Internet, there's you on your computer, and via the Internet, some network of computers, you often access a server to achieve some aim. And for the examples, we'll use a web server, but it could apply to others in most cases. So our model of the Internet so far is that you have your computer, communicates across the Internet to a server, and we want to allow you to talk to the server, and back, of course. Well, if we go into the details of that cloud, we can model it in a bit more detail where we know the Internet is really made up of different subnets connected together via routers. So what I draw here is the routers in between you and the server. The circles represent the routers. The number of routers can be different, so I've just drawn, what, seven in this case, but it can be different. And a common setup, maybe this is you at home, and you connect to your local router, your home router, which then connects into your ISP, your Internet service provider, so you pay some company to provide you Internet service. And they have their router, or multiple routers, and then your ISP connects to another Internet service provider, and so on. There are different Internet service providers that connect together until we form a path that comes to the router of the server, and eventually the server we want to communicate with. Any questions on our network topology? So we have you that wants to talk to a server, and we have routers in between. One thing that we'll talk about when we look at some different Internet privacy options is the role of firewalls, and maybe more interesting from the user's perspective, bypassing firewalls. If there is a firewall at some point in the path between you and the server saying you cannot access that server, how can we bypass that firewall? So in the examples I'll use, let's assume that one of these routers has a firewall on it, and just this one I've chosen, your ISP. And we'll consider what happens if that tries to block you from accessing the server. Can we bypass that? Now, so that's our topology that we're going to consider. What about identifying computers? We know that computers have IP addresses, and we're going to assume that every computer has a globally unique IP address. It's not a true assumption. Unfortunately with technologies like network address translation, not every computer has a globally unique IP address. And you probably know that because you set up your computer at home, and what's an address you may get at home. You connect to a router via Wi-Fi or an ethernet cable, what's an address your computer commonly gets, an IP address. Even inside SIT you probably get repetitions, but at home you probably see a common address. 192, 192.168.1.something sometimes, 1.1, maybe for your router, 1.2 for the second computer and so on. So you get 192.168.1.1.2 in your home and I get the same address in my home, so now we have two computers on the internet with the same IP address, but we assume that addresses should be unique. We should have two different addresses. Why is that? I think you've studied in the computer networking course with Dr. Comwood. It's because of, or how can that be achieved using network address translation. The way that, well, with IP version four we basically don't have enough IP addresses to give everyone one, all right? There's not enough. There's only about four billion addresses and there are more devices that want to connect. So what people have done is said that, okay, inside your home network or your business network, you use these special addresses, these 192.168 addresses or 10.10 addresses. And then your router on your network will translate your internal private address into one public address. And the process is called network address translation. The idea, and I'm not going to explain how networks, I hope you learned it, but it's not necessary to understand the rest. But the idea is that we have our internal network, let's say a private network and we use private addresses, a special range of addresses like 192.168.1.1.2 for the devices inside this internal network. And then there's a router that connects us to the rest of the world, the internet. In the internet we need globally unique addresses so we can route packets through the network. So what network address translation does is that even though there are multiple computers in here, the network address translation software inside the router normally converts these internal addresses, the multiple different ones, into one public address like some globally unique IP address. That is, from everyone on the internet's perspective, when it talks to your computer, one of your computers in your private network, that computer on the internet thinks it's talking to 103.16.4.7, but that is actually the IP address of the router and the router maps that to the specific internal address of your computer. So network address translation maintains that mapping as to which computers are communicating at which point in time. In other words, from the public internet's perspective, if these three internal computers are communicating with someone outside, it looks like it's just one computer. It looks like it's 103.16.4.7. So that's why we don't have globally unique IP addresses because this allows us to have another private network over here, which is also using that, and they can use the same range of addresses, .2, .3, but they need a different globally unique address here, maybe. So inside our private networks, we can reuse addresses, but they are mapped into public addresses which cannot be reused. That's the reality. Network address translation is commonly used. Let's assume it's not used, and let's assume every computer's communicating does have a public IP address. We'll talk about what's the consequence of that assumption when we get to it, but to keep things simple for now, let's assume that everyone does have a globally unique IP address. Everyone has a different address. My laptop has a different address than your laptop at home and elsewhere. Routers have IP addresses. We know that. We can use domain names, but domain names can map to IP addresses. So every computer's identified by an IP address. Just on that issue of a NAT network address translation, what is my laptop's IP address? My laptop's IP address, 10.10.98.50, but that's the internal address only. When I access a website outside, that's not the address that is used. When you access an external website, the SIT router changes that 10.10 address into a public IP address. Which one does it change it into? There are websites that will tell you. When I access an external website, the SIT router changes my internal address into a public address and in this case, 203.131.209.66. Try it on your phone now, phone or computer. Those who have their computers open doing their homework for other courses. Open your browser and visit the domain whatismyipaddress.com. You can do it on your phone. Whatismyipaddress.com. Just a free website to show you your public IP address using SIT, 203.131.209.66. His laptop has the same IP address as my laptop. His is different. Why is yours different? He's using a different network. You're not using the SIT network, but your mobile phone 3G network. So if others using the SIT network, you probably get this same address. So this is NAT working. We have different addresses internally, but the SIT router maps all of those internal addresses to this one public address. So from people outside's perspective, they can't distinguish who it is. When we access this website, which is outside, it thinks it's coming from this one IP address, but it's actually different computers accessing this website. So there is some hiding of the individual computers performed by NAT. Note that this website knows some other stuff. What else does it show us? It tells me a little bit about where I am. How does it do that? How does it know I'm part of Tamasart University? Here's a website in the US, and it knows when I've accessed it, I'm from TU in Bangkok. Why does it know that? Registration of what? ISP specifically. The way that IP addresses are allocated, they're allocated by internet service providers. So an internet service provider, in this case TU is an internet service provider, it is allocated a range of IP addresses from a national registry of IP addresses. So a range of addresses are given to TU, and that range is made public. So people know that the range of addresses 203, 131, 209, 66, and there's others as well, all belong to Tamasart. And there's a public database that you can find that information out, and that's what this website has. When this address contacts this website, the website looks in that database and sees ah, this address belongs to the range that belongs to Tamasart. Or it guesses, I'm from Tamasart, and it guesses quite accurately. You can go and download those databases, they're called IP Geolocation Databases. This is one website, you can download one for free. The free one maps an IP address to a country, and it's 17 megabytes, I downloaded it just before. You can get a bigger one for 500 megabytes, also free, it maps IP address to city, maybe to Bangkok for example. Or you can pay money, and you can get more accurate location. It'll map, try to map the IP address to a more precise location, because I've collected that information to do that. So that's what that website has, it has such a database to return that information. So what can someone learn when they see a packet sent by my computer out on the internet? Well if they see the IP address, they can learn that the person communicating is part of TU. Because of network address translation, they may not learn that it is Steve, that it's accessing the website, but they learn it's someone at Tamasart that's accessing the website. We would often want to hide that information, or in some cases we would like to see how can we hide from the web server and from others, that it is someone from TU communicating with it, and that's what the internet privacy options are about. So we're just setting it up, how to get the options, what else? We know how IP works, that's fine, your experts. We know how web browsing works, HTTP, browser sends a request, server sends a response, simple. The request says what page we're looking for, index.html for example, and about port numbers. Security in the internet, you know, we've just done a whole course on it, well let's talk about the difference between confidentiality and privacy. As we distinguish, we often want to keep data confidential, and we do that by encrypting the data. We saw that with IPsec, TLS, application security, encrypt the data, and no one can see it unless they have the key. Why do we keep data confidential? Many reasons, okay, some examples listed here. But sometimes we also want to keep our actions private. In addition to the data confidential, what does that mean? The actions, actions may include who is communicating with who, when are they communicating, and how often they're communicating. The frequency of sending messages between those entities. Why? Well I'm sure you can think of reasons why you may want to keep actions private, here are some listed. Usually you work for one company, and you're not happy with that company, and while you're working with them, you're looking for new jobs. So that company you're working for monitors the internet access, and that they see you're visiting a job website. So they fire you, or make it hard for you for the rest of your job. So you may want to hide your actions there. Or you may want to report something that your company is doing illegally, and you don't want them to take action against you before it's reported. So or you may be visiting websites to learn about some medical condition you don't want others to know about. So there are many examples where we'd like to keep actions private. Some for good reasons, some for bad reasons. So bad people would want to keep the actions private as well. How do we do that? Well, let's list some requirements that we want to achieve, and then we'll go through some techniques and see which techniques achieve these requirements. So these are the common things we would like. We may not want all of them at the same time, so we'll see what solutions can provide these. First, I often don't want anyone but the server to see my data. So the communications are following, sorry to go back, this model. You want to talk to the server S. You want to send data back and forth between you and the server. From the security perspective, you may not want anyone else but the server to see your data. Of course, if you send data to the server, the server can see it, but you don't want anyone on the internet in between to see that data. How do we solve that? To meet the first requirement, what do we do? I don't want someone to see my data. What do I do with that data? Encrypted. So we've got ways to deal with that one. Another thing we may want is that I don't want others to know that I'm communicating with the server. So I'm sending packets to the server. I don't want someone in the internet to know that it's me talking to that server. And there's maybe two sub steps of that or sub phases is sometimes I don't want them to know while it's actually happening. That is, if I send a packet to the server, so I'm communicating with a server, I don't want them to be able to observe as I'm doing the communications that it's Steve talking to www.nastywebsite.com. So that's during the communications. Sometimes we'd also like to be able to make that difficult to find out after the communications are taken place. Of access to website, next week someone comes to that website and tries to find out, did Steve access that website? How would they do that? They would look at the logs. You know from the Apache web server log, you can see what time different IP addresses accessed it. So there are two levels there of what we may want in terms of security. Sometimes I would like to be able to access a server without that server knowing it's me that's accessing it. So I want to be hidden from the server. And another common or another security requirement that we'll have sometimes is I want to access a server even when the firewall in the network is blocking access to that server. I'd like to bypass the firewall. Other security requirements we'll try to achieve, of course from a convenience perspective I'd like to have techniques which are free, are cheap. We don't have to pay money to do that. Easy to use and set up. I don't have to do much on my computer or other computers to get this to work and that they are fast in terms of communications they perform well. So we'll go through three main options of how to try to achieve some of these security requirements and compare them from the perspective of convenience. And as we go through those, or to go through those we need to make a few assumptions. Let's assume encryption works. If we encrypt data no one can decrypt it. The path, well there's a fixed path between you and the server although it may change. I think that's not relevant just yet. We're assuming that computers have globally unique IP addresses. Let's forget about NAT for a moment. And we're assuming that given an IP address, if someone knows the IP address, they can map it back to the user. So if they see the IP address, they know that that belongs to Steve. And it's not so hard to do that. And when we use a firewall, when a firewall is used in the network, we'll assume that that is a very simple packet filtering firewall where it simply blocks based upon IP address. So that's the assumptions we're going to rely on. In our pictures, these are some of the terminologies, so you can refer back to that. We'll just introduce the very first approach to set it up. So here's our network model. Let's assume I'm using HTTP and you want to access the server. So normally, you create an IP datagram. Your source address is yours. The destination is that of the server, S. It contains some data. The data is a TCP segment with a HTTP request, requesting a web page. If there's a firewall on one of the routers and that firewall has a filter, has a rule that says block anything going to destination S, what happens? Can I access the website? No, we can't access the website. You did this in your last homework and you see your scores for the homework that are on the website now. You see you can easily set up the firewall to block access to that server by dropping packets destined to IP address S. So this is simply saying if the firewall blocks to destination S and if we use HTTP in the normal way, we cannot access the website. So we'd like to see some techniques for bypassing that firewall. Can we still access the website using some other techniques? What if there's no firewall? From a security perspective, what does the malicious user in the internet know? If there's no firewall, our packet goes to the server and the server sends a response back. We can communicate. What I list here are the default behavior. When we're using HTTP, there's no firewall. If you talk to the server S, what can others learn? Well, the firewall, even though it's not blocking you, it knows you're communicating with the server. How does it know that? Based upon the IP addresses in the packet. The firewall can monitor this packet and see that you are talking to S. Therefore, the firewall knows it's you communicating with the server. We're only using HTTP, there's no encryption. So the firewall can read the data. Let's say someone's out on the routers in the internet here. They can also see it's me talking to the server, because they will see that the source is you and the destination is S. And they can also see the data because the data is not encrypted. And when the server receives the packet, the server sees the source IP address is you. Therefore, the server knows it's you that's contacting it. In this case, the very simple case, none of our security requirements are met. We don't get any security when we're using HTTP on its own, because anyone between me and the server can see the data, including the firewall or others out here. They can see who's communicating based upon the IP addresses. And the server knows it's you communicating with it, again, based upon the IP addresses. So this is the baseline scenario. With HTTPS, there's no security, sorry, with HTTP, there's no security. We want to build upon that and see what security requirements we can meet.