 Hello everyone. Welcome to Computer Science E1. So you've only seen me for a few minutes before. I'm Dan Armandaris. Hi again. It's been a few weeks and today and not next week because it's a holiday, but the week after we're going to be talking about the Internet and there's a lot of scary stuff on the Internet as you probably know and it's not even the content itself, but also sort of what's inherent to the Internet and there's a lot of stuff that you're going to hear today and in two weeks from now that might scare you a little bit and that's actually a little bit okay because we're going to also talk about some of the remedies to this, how you can protect yourself against some of these pitfalls of the Internet itself. And so many of you I'm sure have used the Internet quite a bit and some of you are probably even using it right now and the point of this ubiquity of the Internet is that it's become very very redundant in many ways. There's a lot of servers out there that replicate the same content and show you the same thing so that if one happens to go down then another one can pick right up and where it left off. And there's also you've probably heard of sort of the way that the Internet routes traffic from one point to another and is able to do this in somewhat of a secure way. And so for us to take a look at this at first I think that I want to show you something that I think David might have mentioned briefly but I do just want to touch on one more time and that is this idea of a command called Traceroute and there's this command that you can use either on Windows or Linux or on a Mac that allows you to see a request from you to a server somewhere else whether it be right next door or across the country or even somewhere else in the world and it lets you see what sort of hops your request is taking to get to this destination because obviously we're not directly connected to every point on the Internet. The question at hand here is how do we contact, how does my computer contact another machine that it has never perhaps never even contacted before? Maybe for example CNN.com or maybe even CNN's Japan version of their site. Maybe I have not actually gone to this site so how does it know where to go? And so to look at this let's do sort of this demo first. And so there's this idea of a Traceroute command and a lot of computers actually have this command. Now obviously one of the things I'm doing is typing it into a command line interface and this is sort of more of an archaic way of issuing commands or running applications on your computer. If you have Mac OS it's built in as the terminal, if you have Windows it's built in as the DOS or the command line window and even on Linux it's sort of inherent to what you, to your installation. So with Traceroute I can enter what it is the domain that I want to visit. So in this case let's say that I want to try CNN.com. So what happens is when I hit enter on this Traceroute command is I'm telling Traceroute that I want to trace the route from my computer to a server somewhere else that is called CNN.com. So let's take a look at what's going on here. So when I first did enter a whole bunch of stuff happened very, very quickly and now we're getting some stars, some asterisks which isn't very interesting but we have enough information right now to look and see what is going on. So if we take a look at this we see the very first line. Take a look at the very first line up there. It says Traceroute warning. CNN.com has multiple addresses. Keep that in mind. This is part of that redundancy thing that I was telling you about before. Not all websites do this obviously but we'll talk more about this in just a little bit. So what we want to do is we ask this command to perform this Traceroute from my computer to CNN.com. You can see that that's what the next line tells us and it also is giving us something this IP address or this numeric address that allows my computer to communicate with CNN servers. So now it runs steps, iterative steps between all of the machines that exist between my computer and CNN.com. So the very first one right here you can see is 10.0.1.1. So that's something. There's not a lot of descriptive text right there so it doesn't help us out very much. Then the next one is 1.40.247.82.2. So that also is just sort of numerical. So let's skip that for now. We'll come back to what this means. Now number three is where things start to get kind of interesting. You can see that what it says is core-sc-1-gw-v1416.fas.harvard.edu. So what does this mean? Well we don't know perhaps exactly what this means but we can infer quite a bit from the name. So what does this imply? What does the name imply about whatever this is that my computer has contacted? Any ideas what this could be? Yeah, so that's a very good way. So it's going through a server located in Harvard but we can even narrow it down a little bit more specifically than that. Perhaps SC is representative of something like the Science Center. So maybe this is some sort of core machine in the Science Center. That's just a guess. It could be a whole lot of things that SC could actually represent but this does actually tell you something about the server itself. Okay so let's keep going to step four. Okay bdrgw2-blblblbl.core.net.harvard.edu. What you are seeing are the steps that my computer has to take or rather a request that my computer issues has to take when it goes out to the internet has to go through a variety of servers on Harvard's campus before it can actually be released into the wild. So that's what we are seeing now. These next lines four and five are sort of these gateways between Harvard's internet or Harvard's network and the broader internet at large. So the very next line that you see after one of these core Harvard servers is a boston.level3.net server. So there exist these internet service providers and there are some internet service providers that are bigger than the ones that you might use at home. So you might be familiar with names like AOL or Comcast or Time Warner or a variety of other services that actually provide internet directly to your home. Those are ISPs. Those are internet service providers but there are other internet service providers that work for commercial entities or for educational entities that act as large backbones on the internet. They have a collection of servers that they actually allow to participate in this routing of traffic in the internet. And so this company right here, level three, is actually one of those internet service providers. It does actually provide a backbone in a sense to the internet itself because now my request is on their network and it bounces around in a few of their servers as you can see from Boston to New York to Washington to Atlanta. And that is where my request ultimately ends up before we sort of are unable to find any more information here. So what's happening is that, one second, what's happening is that my request, the cnn.com, starts from my laptop, goes to the access point, then bounces around in some of Harvard's servers before it's finally released into the internet at large at level3.net and then it goes from within level three servers, it goes from a server in Boston to another server in Boston to a server in New York to a server in Washington before finally going to its destination in Atlanta. So presumably then, cnn servers are hosted somewhere in Atlanta. And if there's one more piece of information that I can show you that's kind of interesting, it's this timing stuff that's on the far right. For example, you'll see 25 milliseconds down here and at the very top you'll see stuff like one millisecond or 0.7 milliseconds or something like that. That is the time that it takes for the request from my computer to reach that server, to reach that device. So in the very top one, that makes sense that the time is the shortest because that's the closest device physically to my computer right now. There is a little, I'm not using Harvard's Wi-Fi because we have to do some other things with the internet that I can't use the Wi-Fi for, but I'm using this little device down here. It's sort of a little Apple Airport Express and that is my access point. That basically allows my computer to connect to it wirelessly and I can send some requests to the internet that way. That's that very first top. Now, what is interesting about this timing is that it only takes 25 milliseconds for a request from my computer to go through all these servers down to a machine somewhere in Atlanta. Did I see a question? Yeah, so the IPs, these numbers that we see on the right do actually correlate relatively well not only with companies but also geographic locations. It's not precise. I mean, there becomes a certain point where we can't really identify exactly where a computer is, but many of these numbers have been given out in blocks and these blocks have been given or sold to companies like Level 3 or like Harvard or like MIT or some other company or some other institution and that just by knowing the relationship between these numbers and the owner of that you can perhaps identify the geographic location or perhaps a little bit more information about the computer itself. But we'll talk more about that when we talk specifically about IP addresses. Okay, so this is pretty interesting, but what happens if... Oh, did I see another question? Internet service provider. ISP is Internet Service Provider. So this is pretty interesting, but this is a request that is initiated and terminates within the United States. What if we wanted something that's perhaps a little bit more overseas? So instead of cnn.com, I'll do cnn.co.jp, which is the Japanese version of the same site. So I'll hit enter and we're doing a trace route again. There's a whole bunch of stuff here and before the hops are going to quit on me. So note that each jump, each step here is considered a hop. That just means that I'm going from one location to another. It's because there's sort of nothing between these two locations except a wire that's transferring my request and there is this sort of terminating idea. There's this idea of a server or a router more specifically that exists at each of these locations that actually terminates one of these requests. Before it actually will pass on that request to subsequent routers or servers. So okay, let's take a look at what's going on here. Now a lot of this stuff at the top especially looks very similar to what we saw before and that was that it has to do my request. First has to go through my access point. So the first hop is going to be the same, right? That makes sense and there's no sort of way around that. I think then some of the first stuff that we're seeing the route that my request goes through Harvard's network is also the same which happens to not be so much of a coincidence but is perhaps configured that way intentionally. Then we can see that it goes from Boston to New York but now this is where things start to change a little bit. So rather than being rerouted through perhaps some local way, through some way that's local to the country, rerouted to some set of servers that are perhaps responsible in some way for sending at least some of the traffic out from the United States to the outside world or to other countries besides the United States. So you see it gets bounced around again at the level 3 ISP quite a few times in New York before it gets sent all the way over to this. So it goes from level 3 and at hop 11 which is right here we see that something happens. So what is happening perhaps in the context and again you're not going to know exactly what's happening but just from the context of this name what might be happening. So satellite, so why STTL? Oh STTL, oh that's a little bit later that's the next line but what I'm talking about is line 11 right here that's highlighted. So there's some change between line 10 and line 11 that indicates something happened to my request. Any ideas? Now I'll give you a hint. Look at the last part just before the dot net. It changed ISP so it went from level 3 level 3 passed my request off to another ISP. This ISP might have some more direct route to CNN servers in Japan for some reason. So it makes that decision. It hands my request off to this other ISP and then this other ISP takes the request the rest of the way. So it goes from NYC dot something so that's again New York City down to STTL which might mean satellites but it could also mean Seattle and I think the next two characters after that WA sort of indicate that that might be the case that this is Seattle, Washington. So it goes all the way from New York City to Seattle and then from Seattle take a look at what happens. It goes from Seattle directly to Tokyo. So what this is implying is that there is some really long cable that exists so there's some really big link that exists between Seattle and Tokyo and whether it is a satellite link or whether it is a large undersea cable it's not really clear from just this data that we are provided. Perhaps if we had some more knowledge about this we might know but most likely it is an undersea cable that exists between Seattle and Tokyo. Yes. Multiple IP addresses under the same hop like hop number 8. That could be because it's sending multiple so in order to complete this table for us it has to send multiple requests to each of these servers and it could realize that in order for it to reach a certain destination it could be going through one of these three servers so that could be what is going on there. Okay so if I wanted to restart this because now I want to be a little bit more patient and see what might happen to my request as I let it continue. Hopefully it will give us a little bit more information but you can see that there is something there is some connection between Seattle and Tokyo that allows my request to actually proceed. So this implies here we go we have some more data here we can see that now my request is actually in Japan the .jp is sort of a dead giveaway to that so now my request is in Japan before it finally ends up at CNN servers local to Japan there. Now what's neat is if you take a look at the time between these requests you can actually see about how long it takes for my request to go over the Pacific Ocean from Seattle all the way to Tokyo. So if we look at that same step again step 13 which is the last Seattle step before it goes to Tokyo we'll see that the number of milliseconds jumps by about 100 milliseconds so it takes maybe on average about 100 milliseconds from my request to go from Seattle all the way to Tokyo. So this is actually pretty interesting that's a long distance and 100 milliseconds is pretty quick but in the grand scheme of things 100 milliseconds is also a non-negligible amount of time that's a tenth of a second and so this is something called latency the amount of time that it takes for your request to actually be received by the server and then a response received by your computer that is a latency a high latency means that it takes a long time once you issue a request for you to get a response so the latency then is inherently going to be higher from my computer sitting here in Cambridge, Massachusetts to a server all the way in Japan then it will be then the same latency will be from my same computer to one in Atlanta just because we're limited by physics here the speed of light can only transfer this request so quickly and that's really the limiting factor in this case at least in the latency in this particular instance and so a lot of companies have servers that are around the world for this very reason Google for example has clusters of machines they basically have like literally huge shipping containers full of machines they're all sort of self contained and they drop these all over geographically like all over the place just so that the latency to reach one of their Google.com servers goes down just because the closer a server is to you physically the less time it will take for your computer to reach one of those and receive a response back so it would sort of be killer to Google in a certain sense if they only had all of their servers in California because that means that all of us that live here in the east coast we're going to spend a non-trivial amount of time well okay so it is sort of trivial but it will feel slower to us like we notice that tenth of a second difference and it adds up over time so we would notice that Google.com is a little bit slower than perhaps some other websites that exist a little bit closer now this isn't to say that latency should be confused with speed it's just that there's this delay and it's that delay that's important because there's also this other concept of download speed so you might be receiving data at a very fast rate and that's a different concept altogether that's just something that you should keep in mind so to give you an example I might have a cell phone for example that operates that can connect to the internet over 3G or even now the new marketing term 4G which is just sort of a fancy souped up 3G it's not I think technically not 4G quite yet and so they say okay the download speeds are actually really good on these devices which is actually kind of true but the reason that these devices feel slow is because of latency it takes time for a request from like a cell phone from a 3G device to actually make it over the air to the towers and get sent to the final server but once that connection is made once you've actually performed that initial connection that downloading of the data actually proceeds at a pretty good rate so there's this difference between latency and speed that you should be mindful of because it does actually make a difference like that is a good question to ask is okay if I'm trying to connect to a website and this website seems to be slow for some reason is it that the server is sending me the data slowly or is it that there's a high latency so that I perform a request and it's taking a long time for that request to reach the server or to come back or both in combination that's a good question to ask especially when you're trying to troubleshoot some particular problem okay so let's go back to this trace route idea so now notice that there's a couple of other things that we can really talk about in terms of the internet just from this same data so notice that we have a couple of things going on here notice that we have on the left some name scheme for all of these servers so for example we have one that's we might recall we had the core dot harvard dot edu one it actually had a server actually had a name to it and then adjacent to that is that server's IP address just like when I tried to go to cnn.com and it told me that it was using a specific address for cnn.com that was also something that was sort of interesting what's going on in the background we've talked about not really at length but David has mentioned that the way the computers contact each other on the internet is through this idea of an IP address an IP address is very important it is basically what it sounds like it is an address for every computer on the internet if your computer is going to contact another computer on the internet both of those machines have to have an IP address otherwise the connection cannot be made an IP address basically looks something like this w dot x dot y dot z is a number in the range 0 to 255 so you could have like 0 dot 0 dot 0 dot 0 or 255 dot 255 dot 255 uh, I lost count dot 255 maybe then you might even have some other ones so like these over here 129 dot 250 dot 4 dot 190 this is a valid IP address but there's something special about this range what does that number represent we talked about having 256 discrete values what does that mean 8 bits or a byte right exactly so we have exactly 8 bits or 2 to the 8th possibilities for each of these so called octets each of these is just called an octet by the way so we have 4 of these octets so together we have in combination 8 bits plus 8 bits plus 8 bits plus 8 bits we have 32 bits total this is okay we have 256 possible um, possible address or possible numbers here in this slot we have 256 possible in this slot so on and so forth 4 times you can multiply all those together and you find out the total number of addresses that are actually possible in this IP space so what we're talking about if we have 2 to the 32 or 256 times itself 4 times that is approximately equal to about 4.2 billion addresses that sounds like a lot but it's really not in the current in the current way that we do this because I mean how many people live in the world right now what is the human population it's about yeah it's about 6 billion or so granted not everybody has a computer but certainly I know that in my case I have let's see even in this room I have 2 devices that have IP addresses at home I have 2, 3, 4, 5 more devices that have IP addresses so just me I'm being selfish in taking up 7 of these IP addresses so just one person me is using up 7 of these 4.2 billion so okay you assume that there's quite a few people perhaps in more affluent countries that actually have access to a multitude of devices even single computers can have multiple IP addresses associated with them we'll talk about how that's possible again in a little bit but we are quickly running out of these addresses and that's and that sounds very sensationalist and in a way it kind of is because if you recall one of the things I talked about earlier was that these IP addresses are sold to companies or to institutions in blocks and so it is those blocks that we have run out of it's not to say that we're using right now every single address that is possible in this IP or so called IP version 4 IP version 4 of this address it's just that all of these IP addresses have been accounted for and they possibly can be used and this is important because that means that no more companies can sort of come in and say okay I want a block of IP addresses so that I can host my own website that's just not possible unless they buy a block of IP addresses unused from some other company that just happens to have a couple lying around that are unused and so there is this big idea of running out of space running out of IP space that exists and there are actually a couple of solutions for this one of them we're currently pretty much all of us are using it's this idea of network address translation what this actually means we'll talk about a little bit later of course when we have a little bit more network terminology under our belt but that basically allows you to share one IP address with a couple of different computers and so that's something that helps out a lot because I said before that I have seven devices that are using IP addresses but really after network address translation or NAT it boils down to about three addresses that I'm really using ultimately so then there's the other solution as well that people have been talking about and I'm sure there's other smaller solutions as well and that is this move to something called IP version 6 and IP version 6 address is much much much much much much bigger than this so this address we said has two to the 32nd possibilities about 4.2 billion possibilities but an IP version 6 address has two to the 128 and now pay attention this doesn't mean that we have four times as many because 32 times 4 is 128 that's not what it means at all whereas we had about 4.2 billion here in this one I think we have 3.8 times 10 to the 38 addresses so 3 8 and then 37 zeros after that and that's how many possible addresses that are in a 128 bit space that is an enormous number of IP addresses and just to give you an idea if we gave one let's see if we gave one person every person in the world a block of IP addresses we could give everybody something like 2. 2. something times 10 to the 28 so you could have 2. something times 10 to the 28th addresses just for yourself that's a lot of IPads to have or a lot of computers that you could have on the internet all at once now this is an oversimplification of course of IP version 6 a lot of the functionality that's been added to it is more efficient routing so this stuff that we saw up here routing of a request from one computer to another that happens because the computer or each of these servers has to perform some calculation and say okay well okay I think that this person wants a request from cnn.co.jp so it should go in this direction this person wants a request to cnn.com so that goes in this direction that's actually a decision that each of these servers or so called routers has to make but one of the things that this IP version 6 allows us to do is more efficient routing so really it's sort of a white lie that there's available addresses I mean there will be but it's just that many of the space much of the space will be used to help make this whole process a little bit more efficient and a little bit faster which will be nice but obviously we have enough addresses in this IP version 6 that we're not going to have to worry about this for a very long time at least that's the hope I'm sure they said the same thing when Al Gore decided to invent the internet about IP version 4 but still it's this it's I mean we have we're buying ourselves some time with that just because I mean I think this this is more than the number of stars in the Milky Way Galaxy for every person I think will be quite okay in this case so okay moving on moving back rather to IP version 4 so we have all of these addresses that are made up and so all of our computers are assigned an address but this doesn't really answer the original question that I had asked before I went off on this long tangent about IP addressing about how what makes this connection between the name that appears on the left side of this trace route and the IP address that appears in parentheses and what about that warning that we saw CNN.com has multiple addresses associated with it what does that mean well there is the service on the internet that's called DNS it's called the domain name server and each of these servers are responsible basically like an internet address book it's sort of like an analogy that you can think of this so if you want to mail a company let's say that you wanted to mail Google for example what you would do is say okay I know the name of this company Google I don't know their address so I'm going to look it up so I'm going to go to a phone book crack it open go to Google and see what their address is then you will know what their address is so you can send some correspondence maybe mail or a package or what have you so in this case this is the domain name server is basically the address book of the internet of course simplifying quite a bit because there are many domain name servers on the internet but we can actually take a look to see what the IP address of a given domain and a domain is just this name representation for all of these servers what an IP address of a domain might be so CNN.com for example host CNN.com so there's another command that exists in Mac OS and Linux called host that allows you to perform this look up to look up the IP address given the domain of okay to look up an IP address given the domain of a server so when I hit enter we can see this is the result CNN.com CNN.com has address 157.166.255.19 it also has address 157.166.255.18 that 26 that 25 that 26 and 244 224 .25 so this has a whole bunch of IP addresses associated with one domain so now losing this idea that analogy before of a phone book and sort of thinking now in terms of the internet why might we want this why is this useful to have if and if we go back to our idea that each one of these IP addresses represents one machine one physical computer somewhere on the internet what does this allow us to do if we have CNN.com pointing to a variety of these servers right exactly so if we have too many requests for one server to handle because maybe it's an older server or maybe because there's a very popular news story that just broke on CNN and they're getting a lot of extra traffic then perhaps that one server is going to get overloaded with requests so that's certainly one thing this is a concept called load balancing where a server or rather a system administrator will try to balance the load of some traffic across multiple machines so that if one machine gets a lot of traffic that traffic will sort of bleed off to another machine that can help it with the load so this is probably this could be certainly a load balancing issue another thing could also be what if one of the servers just happens to go offline maybe it's hard drive dies for example and it can't actually read the content that the person is requesting to send back over the internet well this allows or if maybe perhaps more likely if the internet to maybe the cable to one of the machines has either been cut or has gone bad or the internet the port itself that the cable connects to on the computer has gone bad or something like that and that computer is no longer reachable but there's still a variety of other machines that exist for us to be able to contact and do and be able to retrieve the same data that we had wanted to before so this is an important concept not only load balancing but also giving us some duplication of all of this data now I think there is one more sort of neat trace route example that I want to do and that is to let's see if I can remember the domain egyptse.com now probably many of you have heard of this the recent events that have happened in Egypt and of course one of the more interesting things that has happened and of course it's I'm not trying to make light of the situation anyway but in terms of us looking at the internet one of the more interesting things that has happened there is essentially the entire the internet being entirely shut down to the country at a whole so if I do this trace route egyptse.com and take a look to see what might be going on we'll notice a certain number of hops in my computer and this egyptse which is basically just egypt's stock exchange website that's very important for their commerce let's see did I see a question before how do servers pass information when there's an abrupt failure like a hard drive so okay so that's a good question it's actually pretty it's actually pretty complicated of an answer so if we have sort of a catastrophic failure in one server not in the rest so for example that cnn.com saw had a whole bunch of IP addresses associated with different servers where content was available what happens if one of those machines happens to go down well it really depends when that happens I think if you happen to be contacting or making a request from that server as it goes down most likely you're not going to receive a response from the server and so your computer will sort of be in limbo if you've ever tried to visit a website and just seems like it's trying to load and it's trying to load and you're not seeing anything on the screen but you can visit other websites just fine that's sort of the same idea where the request has just been lost somehow and the computer hasn't yet realized that that's happened now the way that load balancing works is actually pretty good once a load balancing device and there are other servers that will actually perform this load balancing for these servers when it's a text that a server has gone down it will automatically reroute all traffic to the other servers so subsequent requests ones that are not initiated immediately like when that happens will actually be rerouted properly and all of the servers most likely have the exact same content from one to the next so that any one server can pick up the slack of the next so if you visit the website of one of those IP addresses so this is actually a pretty good a pretty good example I think if we wanted to take a look at what happens if we try to visit one of these IP addresses so obviously when we go to a website on our web browser what we type in is the domain name of that website cnn.com google.com what have you maybe I will do google instead of cnn just because there are a lot of there's always a lot of distraction on cnn.com as people read the latest news so we can see that google itself also has a variety of IP addresses associated with its domain name so what happens if rather than going to http colon double slash google.com in my web browser I go to this IP address itself well technically what should happen is that this should allow it to work I should just be able to go to this IP address hit return and I go to google.com similarly I can go to any one of these other IP addresses so rather than the one that I went to before which was 72.14.204.103 I'll go to 204.147 and we should see that it is exactly the same thing and that's in fact what we see and you can see that up at the top what I've entered into my web browser is not the domain name but rather the IP address and this is because well all this domain name stuff does for us is it just abstracts this idea of visiting a website away from having to use these IP addresses because it's not very user friendly if you wanted to go to google.com you had to remember the address 72.14.204.147 and every time you want to go to that you have to type that in however what your computer is doing is when you type in a domain like google.com it looks up that address information and then it contacts that address on your behalf so this works both ways just because we can this is how the computer will also be able to this is how the computer actually does a request for a web page yes what happens if you type in IP address without a website associated with it so this is true there's a variety of computers that are connected to the internet that are not web servers so for example my computer is sitting here and it is not it is not serving any web pages it doesn't have that capability because it's turned off right now and so if I were to find the IP address of my computer and type it into a web browser basically nothing would happen it would eventually time out it's sort of like the same idea where a request gets sent out and it's not there's no response there's no response there's no response so we'll just sort of see a white page or a blank page with just sort of like a spinning globe for a while before the browser eventually or the computer more generally just decides to give up on contacting it so just because a computer is on the internet and has an IP address doesn't mean that it is necessarily acting as a server and this is an important distinction so there are the concepts of clients on the on the internet and servers so my computer right now is a client because all it's doing is that I'm issuing requests of other servers and we've talked about what servers were and in context it's the same sort of thing the servers actually serve content to me so there is a machine that has this IP address somewhere in the internet and that machine is actually it is a server perhaps maybe in the physical sense where a server can be some sort of a big loud noisy machine that happens to do a lot of stuff but it doesn't necessarily have to mean that the term has sort of been conflated it's possible for my own computer a small tiny MacBook to be able to be a server for particular types of things on the internet but you know we'll talk about that at a later point in time did I see a question over here yes someone I can buy up the internet real estate which is big and I can get whatever website name I want to be associated with an IP address so is that also a cause of the increased usage of running out of IPs okay so just to repeat for the camera so going to a website that you can actually purchase domain names from godaddy.com or what have you and you actually purchase some real estate online contribute to this problem of running out of IP addresses technically no not really because this concept is in fact separate so like we talked about there exists this domain name server there's this service online that acts as an address book between a domain name and an IP address just purchasing a domain name doesn't give you an IP address you just have that one domain name what you have to do when you configure your domain name after you purchase it for example is you actually have to point it to an IP address of a machine that is acting as a server so this is why and this is one of the things that's particularly confusing I think for people that want to get online or rather to make a website and host it online and with their own sort of you know myname.com or what have you this is difficult because you have to do two things you first have to purchase the domain name which is just that it's just the name it doesn't give you any rights it doesn't give you a computer to host all of these files it doesn't give you an IP address of a server that you can put your files on and then other people can access that's a separate thing that's called hosting so typically when you get a domain name there's two steps you have to do you first have to buy the domain name and then you have to buy a host so a host is basically you're probably purchasing the capability on somebody's computer to put your files and that computer will then host those files for you and they will give you an IP address and say okay this server's IP address is such and such and such so then you go back to the registrar to the place that you bought the domain name and you actually input that data into the DNS into they have like they will generally have some DNS set up or something like that that allows your domain to then point to an IP address and so that is that's the sort of two step process that you have to do and so buying a domain name by itself then does not contribute to this problem but maybe if you were to bring your own server online and it has its own IP address that you then point the domain name servers to that would then contribute to this problem so in absolute terms yeah that's how I would answer that okay any other questions before we move on yes crowd sourcing let's see so is using our own computer as a server crowd sourcing is about so yes and no so I would say in the generic in the generic usage of the term you might be able to say that yes but I think the specific what people say when they crowd source something is that they give some task to a whole bunch of people that then complete that task and it doesn't necessarily have to be on a computer that is a server but there do exist services like for example BitTorrent for example which is a way of downloading so it's right now BitTorrent has this it has this reputation of being for downloading illegal movies and this sort of stuff but there are actually a lot of legitimate uses for BitTorrent and one of the things that BitTorrent does is that it allows your computer to act as both a client and a server in that you download a file and as you download that file you're also serving so let's see to make this a little bit more concrete let's say you download a file that is basically this large so as you download chunks of this file you then have downloaded say this quarter of it so then your computer then says okay I have successfully downloaded this quarter of this file so I will serve it out to whoever doesn't have it which is sort of interesting because then that means that rather than us doing us doing some data from one server and retrieving all of this data from one computer we are then able to download data from other servers elsewhere so that we're not overloading one server in particular and this is sort of a crowd sourcing solution in a sense to the downloading problem because then you're distributing this download to a whole bunch of machines that are able to not only download the portions that they don't have but also serve the portions that they do act as a server so that others can download the portions that this machine already has downloaded okay but we're sort of getting ahead of ourselves a little bit with that any other questions before we move on okay so what is particularly interesting I think in recent history is now this idea of this Egyptian sort of falling off the internet that we had seen and if we take a look at what happens to our routes just notice a couple of things that are particularly interesting I mean the first few steps the first few hops what we've seen already then very quickly does it oh man this is not a very good trace route is it I have a better one on my computer at home if we need to do this let's see okay so here alright oops one second let's see I just have to get this up this is not working one second I have this information right here so what I did on my computer at home was actually do this trace route so sometimes in some networks it becomes sort of difficult for you to do a trace route but obviously when you're on a different network then you might be able to get a different direction or a different set of hops to a particular server so I did this before I even came here today to the hopefully this is the same one yep and so we can see the trace route from my computer at home which is in west Cambridge all the way to Egypt and so all we're seeing right now is just the screen of my computer at home that it already performed this trace route so we can see some interesting things the first few steps obviously are local only to me as I get on as my request goes on to Comcast then Comcast passes it off to its own higher ISP to whatever backbone of the internet that they decide is useful for this request then we can see that it goes from New York let's see all the way down to Newark to Milan to Palermo and then all the way down to let's see eventually we get here we go telecom.egypt.seabone.net and I think hopefully I have let's see hopefully I can scroll down a little bit and now finally we can see some of these last bits these are all in Egypt now so from Palermo and Sicily we went down to TEdata.net and so on and so forth so if you look at some of these domains and you don't really know what they are there's a couple of useful tools that you can do to try to figure out what exactly is going on the first one is obviously just try to visit that domain as a website so TEdata.net let's see what happens when I try to visit that website just because what I want is to be able to determine what is going on with my request so TEdata.com we can see that we have reached this website TEdata.net I just said that I made that mistake trying this at home so here I think this is going to be what we want obviously this was a little bit slower loading this web page don't confuse slowness with latency notice that it took a while before things started showing up, that was this latency that happens because my request is going all the way from Cambridge all the way to Egypt on the other side of the world so I can't read Arabic but luckily there is an English button right here which is very convenient for us and now we can see some additional thing the tagline up there is the fastest internet network in Egypt so okay so now finally have we reached perhaps an ISP that exists in Egypt that this request is being sent on to in order to go to the stock exchange in Egypt itself now what is interesting is all of the stuff that had happened when the internet actually died off so this is the website that we've been trying to perform a request to the Egyptian exchange at egyptsc.com but if you actually take a look at all of the data that had been compiled by people whose job is to monitor networks and to work on networks it actually looks very very interesting to see what had happened so all of these is the same sort of idea so all of these routers all of these servers that we see that are hops are basically called routers what that means is that they do what their name sounds like they are supposed to route traffic on the internet and these routers can be owned by backbones on the internet like that level 3.net some of those other big ones that we've seen what was the other one NTT.net or something like that all of these own routers and they're very big machines and their sole purpose in life is to take some data determine where this data is destined and then pick the right path for it to go so in this case it might have a whole bunch of connections to a router and one router so we've seen some of these routers have been in New York for example and those routers had to make a decision to descend either to Atlanta or to Japan and so that's a decision that it has to make but on a more generic level it says okay well I know that networks in a specific range in a specific IP address range let's say 140.247.something should go in this direction but 140.246 should go in this direction and so these routers then pick the right way for this traffic to go and so what we are looking at is some information that has been that each of these routers actually sends out so these routers in addition to knowing where to send the data they also have to tell other routers what networks these routers are capable of communicating with because then that is how these routers know where to make a decision it says okay if I am a router and I can communicate with 140.246. whatever. whatever and 140.247. whatever and nothing else then the other routers are probably not going to send data to some other IP address other than in those ranges to me right that wouldn't make any sense so what we are seeing is what these routers reported as being is being capable of sending data to on their networks so this is the quantity of networks that we are seeing that these routers can actually communicate with and so initially before all of this happened each all of the aggregates of all of these routers in Egypt and all of these big connections like the TE data.net for example there is a few other large scale ISPs that exist in Egypt they were reporting about 3,000 networks and each of those networks is probably one of these subnet blocks that I had been talking about so w.x.something.something so if I have an address like that then that is considered sort of a subnet network and keep in mind that that is actually still quite a lot of IP addresses so if I have 256 possible addresses in the Z portion 256 possible numbers in the Y portion each of these combined is about 65,000 addresses in a IP address that has some number dot some number dot whatever in that range so there is about 65,000 addresses in that range so this is still a lot of addresses for Egypt so we are looking at about 30,000 of those subnets that dropped very quickly to essentially zero so all of these routers actually said okay I am responsible or I can pass data off to maybe a couple dozen networks that exist in Egypt but over time these routers said I cannot send data to networks in Egypt and so other routers just believed them so they did not route any traffic so what happened was that no traffic was able to route to Egypt itself or route out of Egypt as well because these same routers were unable to report out so do the routers constantly check so there is this protocol called BGP that really don't have to know too much about but it is the protocol that these routers use to communicate with each other and that is what we are looking at so whenever they have an update there is probably a time that that update will live so maybe like an hour or so after which that update will expire and then it will fetch a new update something like that it is probably some similar mechanism to that for these routers to report data to each other so if we take a look at what happened typically these routers I guess do not actually send updates to each other because what happens is that if a router is very happily humming along and passing data from one network to another it doesn't need to update the other routers and say okay I have an update I can no longer communicate with this specific traffic but what happened is that this blue graph that we are seeing here are the number of updates sent from these Egyptian routers and we are seeing a huge spike all of a sudden saying that okay I have an update on the networks now that I can communicate with the red bar graph represents the number of withdrawals which means that okay I can no longer communicate with an address in this range and so very suddenly did we see a huge drop off in the amount of records or the Indian number of networks that we could contact from the outside world in and vice versa just that these sort of high level routers that existed now this is actually really interesting to me just because it really shows us something that we have not seen before it's really been sort of unprecedented at this level to cut off an entire country of all of their internet access and frankly it worked but only to a degree there were a lot of things by the way that happened for example there were a couple of ISPs in France that actually gave Egyptians free internet access just by giving them a phone number to dial up and if you had been following along on the Twitter on the Twitter updates that had happened for Egypt you would start to see after this happened that some of the French ISPs and I think there were some others as well actually gave phone numbers to people that lived in Egypt and said okay contact using dial up using your modem make a long distance telephone call over the Mediterranean Sea all the way to France and we will actually connect you to the internet so there were still ways that people were able to connect to the internet but obviously everybody using dial up all at once is not going to be very feasible to sustain in entire countries internet needs it just happened to be a relatively interesting stop-gap solution yes right so this right so in these graphs what we are seeing is that is not an immediate and total drop but rather it does take a little bit of time for these routers to drop off what accounts for that well there was no sort of idea of a kill switch in this sense like what has been sort of proposed in the I forget somebody has proposed recently that the United States have an internet kill switch so that somebody has the ability to just take the entire internet offline which sort of seems like a ridiculous thing if this what happened in Egypt is sort of any indication of what's a population how a population will handle it obviously there are a lot of other factors as well but a lot of people did in fact fight this idea of having lost their main method of communication it does imply that it did take some time there were several routers that Egypt has and they had to change the routes on all of them so it could have been that there were only a few of them a few main ones that they really had to do but it does take time to configure those and allow those updates to actually propagate over the internet so that this will actually occur yeah if Egypt wanted to still stay connected for their own use but not letting information of their way to like tell a service to it I'm not going to talk to people I'm not going to say anything back to you except whatever I want I'm not going to take it off one side but the other routers stopped so if you want to do like a one way kill so that it's not possible to allow data to come in and go out that's in the way networking works it's not really possible just because a lot of communication that occurs between your machine and another computer is that you initiate a request and you have to receive a response from that machine whether it be in the sense of a web page or even just something like this trace route that we had seen there was two way communication that existed and so blocking all communication in the opposite direction it's reworking to just blindly accept this data or let's see at least in the sense of web pages there are other protocols as well so one of the things that one of the acronyms that's pretty common these days that you will hear is TCP slash IP and usually you'll see IP is version 4 but usually you see these two combined together TCP is a protocol that we will talk about more next week that will actually I think address your question a bit more but TCP at least requires a return from the server to acknowledge that it is received a request whether or not the acknowledgement also returns data is another thing entirely I think okay any other questions yes so this I think this was actually this was actually a concern even in Egypt so people were questioning what would happen to sort of essential services that do need perhaps access to the internet and perhaps hospitals and what else did you say like airline traffic right so like airline so like travel information for example or airline control and hospitals that sort of stuff I think that I think a lot of those things can still function on their own but what they were most concerned about I think was the financial aspect of it which does actually heavily require data and at the time when this was in full force it actually was completely down there was one network though you'll notice that in a lot of these graphs there was let's see not in this one but perhaps the one before there were actually a few networks that could still do this and that was because there was one ISP all the others all the others went down but there was one ISP in Egypt that maintained its router and so it was possible for a number of services and actually some of the servers on the Egypt stock exchange were actually on this network so some communication was possible it's just that a lot of the other servers were on other ISPs as well so I had mentioned before and this ties in nicely with what I wanted to get to with this I mentioned before that the internet has a lot of redundancy but there are a finite number of cables that we have from one country to the next so there's a lot of undersea cables that exist not only from here to Europe and here to Japan and to China but there's also even in the Mediterranean sea there's a bunch of cables and you might recall from a couple years back don't remember the specifics but there was a concern that somebody was maliciously cutting or severing some of these ties because one country I think it was some I think it was a country like Middle East country like Jordan or one of those in that area they were gradually losing the vast majority of their connections to the outside world just because people were cutting these cables and so while the internet does have a bunch of redundancy there are a finite number of backbones there are a finite number of routers so that if enough goes down it is technically possible to sort of split the internet and so Egypt had its own network where you could perhaps communicate to other people within Egypt but you would not be able to communicate any servers outside of Egypt or from the outside in that sort of idea is very important to this idea of the internet this idea of having lots of interconnected networks but we do have these sorts of vulnerabilities just because of the physical connections themselves perhaps this not quite physical but a virtual connection that exists between these routers that each router then knows how to transmit data from one network to the next this is actually an important topic and one that we will continue talking about after a 5 minute break Hello everyone, welcome back to Computer Science E1 so before the break we were talking a bit about some of these neat ideas in the internet but really not we're talking about a lot of this stuff in sort of an abstract way like looking at these servers that exist elsewhere on the internet what happens when we are talking about each of these machines or each of our computers themselves well of course our computers have to be configured to use the internet and so when I because my computer is actually online right now it also has an IP address that has been given to me by some servers somewhere else and I don't really know where that server is though I can sort of guess perhaps that it's associated with Harvard but I don't really what that server is I don't have to know that server's IP address and there's a couple of interesting things that happen as a result of my computer contacting another server and retrieving some information back from it one of the things of which is the IP address that my computer is going to use to connect to the internet so we can see here my IP address we can see that I have been given an IP version 4 address of 10.249.something.something and now what's important to realize is that some IP addresses some IP addresses that start with specific numbers are meant to be private IP addresses and what I mean by that is that another computer that is on the internet cannot directly address my computer with this IP address there's a couple like that that is there's a couple of A records or not A records there's a couple of class A1s and I'll mention with that as a class A IP address basically this W if there is a number in this W like 10.something.something that is a class A so 10.something.x.y.z this is considered private and again what that means is that it's not necessarily anything that has to do with my privacy on the internet but rather that my computer has been given a private IP address that is not directly addressable by another machine on the internet somewhere else right now cannot act as a server just because it cannot be directly referenced by another computer on its IP address we'll talk more about what that means and the implications of that when we talk a little bit in detail about network address translation which I had mentioned before another one that you've probably seen is 192.168.something.something the 192.168 subnet is another private IP address that could be given so if you have a router at home for example with an IP address on your computer you most likely have an IP address that is in one of these two forms 10.something.something.something or 192.168.something.something in fact a lot of computers will self assign an IP address to themselves if they're not able to get one from a server 192.168.something.something again that should indicate to you that that is a private IP address though usually if you have an IP address like that you might be able to infer that that is actually a little bit wonky so we can see a couple of other interesting things from this same information one there's a subnet mask and what this subnet mask actually does is it tells my computer what IP address is the computer should be considered to be on the same network so sort of adjacent to this computer so maybe those of you that are using computers or even like smartphones like an iPhone or an Android device that have an IP address and are connected to the local Wi-Fi you probably have an IP address and you will also have a subnet mask that tells your computer what machines, what IP address range will actually be considered local and so you can see that it says 255.255.248.0 what this basically means is that it is going to consider IP addresses in the range 10.249.something.something to be within the same network and again I'm oversimplifying a little bit but that's basically what is happening now the next one is very important and it is the IP address of the first router that exists between my computer and the outside world so every request that my computer makes has to go through this router because it's the first hop it is always going to be the first hop in all of my requests to other machines on the network just because that is my gateway that is the way that my computer will be able to make requests to the outside world now all of this stuff happens over links and so realize that there's multiple layers to the internet we'll talk more about layers next week of course I've been doing a lot of hand waving saying we'll talk more about it but we have to have a sort of baseline information I think about or a baseline level of some of this information before we can start getting to some specifics but the link itself doesn't really matter right now I'm connected to the Harvard network using Wi-Fi but I could also connect to it using say an ethernet cable and there are other links that exist as well some other ones that you probably are familiar with like dial-up for example so you would have a modem and that modem would create a dial-up link between your computer and an ISP and that would actually be a specific link now all of this stuff happens sort of separately it happens sort of on top of this actual link that occurs and so we could actually have for example this undersea cable that is another link and that would definitely be another type of link and that is just something that allows us to maintain a connection to allow this TCP-IP stuff to actually function so another thing that is given to me besides the router so this is the address of the first machine that my computer should address everything are the DNS servers the domain name service servers and so what happens here is that these are the IP addresses what does DNS do basically it was an analogy for something that we talked about yeah like a phone book basically so if we have a domain name like google.com or cnn.com we need to know the IP address from of that machine that represents that domain name in order for us to contact it in order for us to be able to retrieve a web page from google.com in order for us to be able to go to Facebook for us to connect to our email server for us to connect to our SAM client any number of things require that our computer first look up the IP address based on the domain name now obviously this would be sort of a chicken and the egg problem if our domain name servers were named with domain names right what would it mean if I had something like a dns1.harvard.edu which isn't a real machine but it's just sort of used as an example well there's no way for my computer to look up the IP address of that we have to use on our computers in order to configure them properly to use the internet our domain name servers now luckily this information is provided to us but we could actually enter it in ourselves and in fact a lot of internet service providers years ago would require they would just give you a little sheet with all the information that you had to type in like your IP address and also your router, the subnet mask and the dns servers and that is what all of these things do so right now we actually have a good system that allows us to retrieve this information automatically when we connect to a network that is called DHCP or Dynamic Host Configuration Protocol in fact you would have seen that we are using DHCP under here under the TCPIP tab we can see that it is going to configure IP version 4 using DHCP so you don't really have to know what DHCP stands for but you do have to know what it does and what it does is provide to our computers all of this information that we just talked about it provides an IP address to my computer it provides the list of domain name servers it provides the router that it should use it provides the subnet mask it provides the basic amount of information that our computers then use to connect to other machines on the internet now we don't have to use DHCP if we knew what we were doing we would not recommend not doing this you could do this manually you could type in an IP address and your computer will blindly accept any IP address that you put in so I could put what did I have 10.1.249 or something like that let's see 10.249.131 so I could actually type in 10.249.131.41 in the manual box and now it would continue to work properly assuming that the router and the subnet mask was actually input correctly but I could also put in something else like 10.249.131.73 for example and that is actually a valid IP address within this same network and so it would continue to work but there's a very big problem with doing this and that is imagine there is a scenario where via DHCP or via some manual process your machine has that same IP address that I just entered manually what do you think would happen how does that even make sense what would happen in the case that two computers on a network have the same IP address any ideas yeah it wouldn't even do that yeah it's like it's sort of like that two people having the same phone number like who does it ring at both phones does somebody pick up first it's really sort of undefined really depends on the router to determine what will happen in that case in some cases both machines will get both messages in some cases neither machine will get it it's really undefined behavior but basically this is a bad thing every IP address has to be unique of course there are private IP addresses but because these are not announced to the internet at large there could be another I could have my own 10.something network at home that's not enough from Harvard that there wouldn't interfere with each other but on the internet at large and within the same network you cannot have two machines that have the same IP address and so configuring manually becomes sort of a nightmare especially if you're moving your laptop from one network to another you're always getting a new IP address how do you know which one it's going to be so using DHCP is something that's very very good dynamic host configuration protocol broadcasts out loud I need an IP something like that and the server will respond okay here you go here's an IP address that's unused here's the subnet mask you should use here's the router that you should use and here's the domain name servers that you should use and recall that just because it's important enough to mention again that the domain name servers are IP addresses that my computer contacts to do this sort of phone book style lookup so if I have a domain name like Amazon.com Google.com CNN.com it's going to contact one of the DNS servers above to find out what that information actually is and this is pretty important because without this we would be kind of stuck in the water and so notice that there is within TCP-IP we can also configure IP version 6 now a lot of machines actually have the capability to use IP version 6 a lot of your Windows computers a lot of your Mac linux machines actually use IP version 6 but the problem is that it's not backwards compatible to IP version 4 so this means that ISPs all have to support it and very very few do I think there's some the latest statistic is something like only fewer than 1% of networks are actually IP version 6 capable so it's a very slow migration to using it and also because it's 128 bits versus 32 bits the address an IP version 6 address is actually really really long and so it doesn't show it here but it's actually a very very long address that includes not only numbers but also some letters A to F and some colons and it just becomes quite a long thing quite a long string of characters in order for us to use it could you sabotage could you sabotage somebody else's computer by changing your computer's IP to match yes and no again it really depends on how the network behaves you could just be blocking yourself out of internet that's not going to be a very reliable way I think of trying to attack somebody else on a network there's much more reliable methods I think of getting somebody else on the internet reliable I will talk about them generally but and also because it is important how to protect yourself from these same things that can actually occur to you did I see a question back there yes when you move your laptop generally because IP addresses are given in blocks to companies like ISPs or to entities like Harvard generally when you move your laptop from one geographic location to another you will receive an entirely new IP address now the exception to that rule might be again these sort of private IPs where I might get a something IP address here at Harvard and going home I might also get a 10.something IP address even though Harvard's computers generally operate on something like 140.247 subnet and the Comcast servers of which my computer is a part because I have that ISP is something like 66.something or 24.something or something like that so my public facing IP address will definitely change the degree to which the the degree to which the IP address that I see on my computer changes will be a little bit different and again this will make a lot more sense when we talk about network address translation just because that is a concept that makes this sort of private IP a lot more meaningful but really generalizing it every computer has to have its own unique IP address on the internet and so yes when you move from one geographic location to another you will definitely see a different IP address just by nature of being on a different network now there's actually some thing that I want to show you a map of the internet so some of you may have heard of this webcomic called XKCD which is sort of a really geeky nerd comic that has a lot of really fun and interesting topics and some of the things that Randall Monroe the creator of this comic actually does it's actually pretty interesting so this is basically a map of the IP version 4 address space about 5 years ago I think in about 2006 that this come out and so every space that you see every block that you see represents a class A address and recall that a class A address is some number in this W thing and then the range of all of the subsequent numbers belong to that class A address and so there's as you can imagine there's only 256 of these class A addresses right from zero all the way up to 255 so these are kind of a hot commodity so it's a big deal for a company or for a country even to own an entire class A block of addresses just because this is an enormous number of addresses this is 256 times 256 times 256 I think that's 16.7 million addresses in a class A address so having one of these is actually means that you have quite a significant percentage of the IP addresses in the world and so what you're seeing all of the green patches were at the time unused so they were un-given out and so obviously now we have now given out all of these so this is now a few years old but this is sort of I think an interesting visualization of this data that we have been talking about so if we take a look at some of the things that exist we can see that so keep in mind that here when we see a country name that's not necessarily the only block that it might own or that might go to that specific geographic location it does actually the US for example actually does have some other blocks as well like 24 and a whole bunch of the ones in the upper left hand corner just because the internet was initiated here we do have a lot of class A addresses that exist but we can see quite a few interesting things like USA has a pretty big block that is up there but some of the interesting ones I think are the class A addresses we can see some of the big companies like HP, Apple, they all own IBM they all own their own class A address and the only educational institution in the entire world to own a class A address is MIT the 18.something.something.something address is guaranteed to be an MIT address just because MIT owns the entirety of the 18 class A space and what that means what MIT does is that they actually will give a subnet to an entire building a subnet is W and then an X so 18.1 for example will represent one building at MIT 18.2 will represent another building and this is so over capacity in that every building then has 65,000 addresses to it that it's just really sort of out of time because then they really did not care about giving you addresses you could just have addresses left and right but it does sort of say something about the excess of the class A itself yes yes no the numbers of the IP addresses do not actually correspond to the numbers of the IP address unfortunately but they do in their domain names they do actually correspond their domain names to the building numbers themselves but there was no as far as there was no unless you knew what the mapping was there was no sort of easy way to tell what each subnet meant in terms of each building but that was pretty useful because at the time what you could do and this is still very relevant today if you send an email keep in mind that when you send an email it actually depends your IP address in the raw headers and we'll look at raw headers of an email address or of an email next week but really there's a lot of data that gets sent in an email and one of the things is your IP address and if you happen to be on say MIT's campus you could actually pinpoint almost exactly where the person was you could pinpoint at least to the building and sometimes even a little bit more specifically just by nature of the IP address that they had sent the email from and this is something that's very useful as well because when you visit a web page any web page in the world your IP address is sent to that server and many of these companies actually collect this data because it's useful to them it's interesting for them to know geographically where a lot of a lot of their users exist for example if a web page realizes okay I have a lot of users in the New England area but not much else then maybe it would make sense for them to create more servers in the New England area rather than say in California or maybe there's something a little bit more targeted like advertising maybe they want to target advertising for your geographic location so IP addresses while they're not the perfect indicator of your location they do actually give away quite a bit of information about where you are located and certainly now if we take a look at my Harvard's of my IP address on Harvard let's see there's a website that is called whatsmyip.org and we had talked about this notion of a public and private address I can always find out what my public IP address is so what the rest of the internet sees as my IP address by going to a website like whatsmyip.org we can see that right now even though we had seen that my computer has a private IP address of 10 dot something it's public IP address the IP address that the rest of the world sees is this 140.247 and I had mentioned before that Harvard owns the subnet 140.247 so every time I visit a web page and if that web page happens to collect my IP address a person that knows things about IP addresses they could look at this information and say oh this is a person that was on Harvard's network at the time that they had visited this web page just by nature of it being 140.247 and that might be something that you do or don't care about because you might not want people to know where you are accessing a website let's say you are visiting China and China is notorious for having this so called great firewall of China which does actually exist by the way when I was there last about a year ago and some months I was there and I wanted to the spring semester was about to start including this class and I needed to look up some stuff on the Harvard extension website they had blocked the Harvard extension website at the time so I'm like okay well I'll check my email at MIT I went to MIT.edu they blocked MIT as well there were a lot of a lot of addresses that they had actually blocked and I started realizing that there's nothing stopping people in China from detecting the IP address that was trying to contact these you know these really bad websites and maybe they would be able to then know a little bit of information about me maybe they would be able to know perhaps the city that I was in or maybe even worse like the building that I was in and maybe this was information that I did not want them to have obviously going to these websites it's not something that I'm really concerned about but it does actually say something about when there are when there is this idea of having very strict regulations from say a from a country at large or perhaps at work then this might be something that you need to or that in some cases that you might want to get around and so there are some ways that you can mask your IP address we'll talk about them next week but this is something that's important is that all of your data that is being sent from your computer to the outside world is not actually all that safe there's a lot of hops along the way all of these servers all these routers that input this data there's well many of them are not configured in this way there's really nothing stopping them from recording all of these requests that you are that you are issuing to other servers and looking at them to inspect them to see what sorts of things their users are doing what sort of things people are trying to do and are not able to do or so on and so forth just things that are perhaps that you may not want other people to know about I see a question over here so there is a yes so how do you own an IP address there is a body basically I think one of them is called the IANA they are responsible for basically allocating the blocks of IP addresses that exist and they recently and this is the reason that in the news you started seeing this all over the place they recently gave away the last block of free IP addresses to and there's some final ceremony in Miami or something a couple of weeks ago that seems sort of silly but they basically give away all of these blocks and I imagine that they are actually sold so that they can actually because they are a company I'm sure they want to maintain their own I'm sure they want to maintain themselves but Harvard actually owns the block of IP addresses in the range 1.0.247 from .0.0 to 1.0.247.255.255 so all of those IP addresses belong to Harvard and Harvard can decide what to do with those IP addresses if that makes sense did I see another question is there an annual charge so I'm not too familiar with the business specifics behind it generally I suspect that it is probably yeah I suspect that it's probably like an annual sort of thing or it's probably based on a contract you probably purchase it for a certain amount of time before it gets re-released but I imagine that now with blocks being essentially gone that they are popular enough that they'll be snatched up right away if you happen to let go of a block that you happen to own now of course when you go home and you see that you have an IP address doesn't mean that you own that IP address most likely what has happened is that you have been given that IP address by your ISP and that IP address can actually change so one of the things that DHCP can do is actually change your IP address every so often so you see that I had a private IP here there's nothing guaranteeing that I will maintain the same IP address when I say close my computer leave and then come back another day I may actually retrieve another IP address all together now oh yeah so what other so how do they how do they identify specific computers and so what sort of information does the computer send well we will look specifically at all of the data that a computer sends when you make a web request and it's actually kind of scary all of the information that is sent across but it includes things such as what type of computer it is what operating system running what browser you are using it also a lot of web browsers stores data called cookies which you're probably familiar with it was this sort of big scare in the 90s oh god cookies are going to be the end of us all they're really not that bad but they are used to track you in a lot of cases as well where a server will actually write some cookie data to your computer and your computer will then send that cookie data back to the server so that server can always identify you as a person but again talking generally about this because we'll look more about at that specifically next week it's actually very hard to be anonymous online in the sense that even though your IP address might change there's a lot of information that your computer is actually sending out and there's a variety of ways that people could identify you either through cookies or through perhaps your IP address if it's not too long of a distance since you last contacted that server from that IP address all these sorts of things can actually mean that somebody can identify you and while it really doesn't matter in the context of us sitting here right now Harvard really doesn't monitor their network as far as I know and it really ultimately we're not under any sort of governmental restraint to not go to specific websites or that sort of thing but it does matter if you are in a case where you don't want to visit perhaps because it is disallowed in some ways where you cannot visit a website and it is blocked in some way that you want to do that in fact one of the ways that you can get around blocks like this typically is with something called VPN or virtual private networking again you know hand waving we'll talk more about this later but this is using VPN you're essentially creating a virtual tunnel from your computer to another set of computers somewhere else and actually Harvard has a VPN where you can create a VPN tunnel between your computer and Harvard servers so that you receive a Harvard IP address even though you are on a completely different network and this I mentioned before to sort of bring this idea full circle where I had been in China and I needed to access MIT.edu and Facebook which was also blocked by the way that one of the ways you can get around this is by using a VPN and you were then you're creating this encrypted tunnel and I then had an IP address while I was sitting in China I then had an IP address of my home computer and was able to contact any websites that I wanted to without worrying too much about it but again really recall that this is sort of advanced stuff that we will talk more about next week after some of this internet basic stuff has settled a little bit more but the main things the main takeaways for now are that these IP addresses are unique and that they uniquely identify a machine on the internet and that using them allows us to communicate from one machine to the next and that if your machine changes geographically then also most likely is your IP address going to change as well so okay so we can see that there's quite a bit of interesting things here now a lot of a lot of class A blocks are actually divided up to a variety of countries so while companies like Apple and HP and IBM have their own class A blocks and MIT has their own class blocks some entire countries do not even have access to class A blocks they might only have access to a subnet or some smaller number of IP addresses just based on however it has been divided up and so this is part of the primary reason that this is sort of a concern in that with IP version 6 we won't have this sort of constraint that we do with IP version 4 now going back to this idea of IP addresses and domain names a lot of these are really intrinsically tied together this idea of having domain names that then map to one or more IP addresses but another XKCD comic that's actually pretty interesting is the map of the internet in terms of domains and so these there are two versions of this does he call it map of the internet map of online communities in this case there were two versions this was a version from spring of 2007 so now this is about four years old almost and you can actually see the size of various communities based on their user base and at the time something that now you can probably see sticks out like a sore thumb is MySpace which is now sort of all but you know defunct at this point but you can see at the time that was huge compared to some places like Facebook and even AOL still has a presence it was still somehow viable at the time or was still still around at the time there's a whole bunch of really interesting things that really indicate the size of a lot of these websites also so let's see there's Wikipedia down there and some other stuff let's see is there anything interesting AOL as we saw Friendster which is well what happened to Friendster Zanga, Facebook MySpace all of these things now the 2011 or the late 2010 version I forget is this and obviously it's supposed to be much bigger but take a look at Facebook the size of Facebook now compared to everything else and the illustrator of Randall Monroe even went so far to compare online the the communication that happens online compared to that of all human communication ever and you can see that in these sort of sub little these little sub things here so we can see that okay what we're seeing this is that this entire map right here with Facebook and then these little islands representing pretty much the rest of the internet and then you can also see that email sort of this huge glob up here and SMS text messaging that's also extremely important but in terms of spoken language cell phones and the internet are this small percentage of of the entirety of communication that exists in humanity right now which is actually I gotta say this is actually pretty big I think for all of this technology to have that big of a chunk of this spoken language because I'm sitting up I'm standing up here and blabbing for two hours that's gotta be quite a big percentage of this of this big thing but we can certainly see that there are some things okay so let's take a look around at some of this alright so here of course Facebook is absolutely massive Farmville is even it's sort of own it's sort of the you know the planes of Farmville over here let's say a whole bunch of stuff so the world of warcrafts over there YouTube is pretty big Twitter is also relatively large and certainly important in a great number of recent historical events including the Egyptian all the Egyptian stuff that it happened and moving on now you can see the relative size of my space compared to Facebook much smaller than it used to be especially just four years ago now we can see some other stuff and one of the interesting things is that we really up here in North America we're sort of protected not protected but we're really sort of separate from all of the subculture the sub internet culture that exists in say China where they are very heavily blocked from things like Facebook like Facebook is certainly disallowed in China but they have their own version which is oh gosh they have their own version of it which I think is called renren.com let's see I think this is it yeah this is it and this is basically the Chinese version Facebook and look the green little button even looks a little bit like the Facebook and I think this is they probably have a couple of competitors there but they basically have their own let's see their own instant messaging called QQ and that's what this big thing is right here so that's sort of this chat client or I'm not too familiar with it but it's this method of chat between between some people and that also represents a big chunk of it and there's even some sub stuff as well you can see that right here there's a little box for forums which includes such things as 4chan which is of course you know sort of the red light district of the internet in some ways and a whole bunch of these other forums that are actually relatively large but not as large as 4chan itself so anyway I recommend that you take a look at this is actually pretty interesting I think of a way of looking at all of the information that exists on the internet and let's see so let's recap some things so realize that we've been talking a lot about some of these various services that exist on the internet today but realize that what actually happens is actually a pretty concrete number of steps so let's say I go to a cafe or I bring my laptop here and I put it down on the table and I open it up and I am trying to connect to a network well the sequence of steps is something like this well first the link has to occur the link between say the wifi link between my computer and the access point has to occur or I have to physically connect the cable into my computer if that's the type of link that I have so then once I have a link then the computer knows ok I don't have an IP address so using DHCP recall that we have a whole bunch of acronyms it doesn't really matter what the acronyms stand for in this case but it matters that you know what they do DHCP will then ask to call out to just whoever will listen I need an IP address and it will receive an IP address a subnet mask, DNS servers and the router as well the router information and that's all the basic information that your computer then needs to be able to access the internet then from there let's say you open up a web browser and you want to go to a web page well you type in something like cnn.com google.com you hit enter what your computer does is it then contacts one of the DNS servers which will then do the look up for this domain name whether it be google cnn amazon what have you and will return the IP address of that server back to your computer then in a separate request your computer will then initiate an HTTP request as it's called to contact either google or amazon or cnn and then will ask that server for the web page and that is sort of its own whole bunch of stuff and that along with some of the more advanced topics like NAT network address translation, VPN router switches all that stuff we will talk about next week not next week but two weeks from now so until then enjoy your holiday next week and we will see you in two weeks