 All right, welcome back to Computer Science E1. This is lecture three, and it's our first of two looks at the internet. So we thought we'd start with a technological demonstration of how data actually traverses the internet. And then we'll dive in deeper and see how it actually gets from point A to point B. So most every one of you today probably sat down in front of a computer, pulled up a website, and went to somewhere.com. Or probably most every one of you pulled up your email program or your favorite web mail program, favorite web mail website. Started typing an email, hit Send, and voila. Some number of milliseconds, microseconds, seconds, minutes later, that email arrived at its destination. But what exactly is between you, point A, and the recipient, point B? Well, this black window here is an example of a Linux computer that's running elsewhere on campus. It's kind of an old school type computer whereby there's no graphical interface that I'm using. It's just a so-called command line environment, or command line interface, CLI. And what that means is anything I want to do with this computer, I have to type at the command prompt. Some of you might recall DOS from yesteryear. It's a very similar idea. But there's a command on this computer called TraceRoute, which as the name implies, allows you to trace the route between two points, A and B. Well, I am going to be implicitly point A. And point B, let's say, is one of my favorite places to visit for the day's news, cnn.com. When I hit Enter, my laptop here on Harvard's campus is going to send a request across the internet to cnn.com. And along the way, it's going to time how long it takes to get from point A to point B. So let's see what's in between us and cnn.com. It looks a little cryptic at first. And in fact, it seems to have paused on one line. But let's see if it makes some forward progress. And it very rapidly did. Looks like 18 steps later, 23 steps later, I'm presumably closer to my destination. Now, this is an imperfect demo knowingly, because some of the computers between me, point A, and cnn, point B, are actually kind of hiding their identity. They're not responding to my requests using this particular tool. Hence, these stars here. So let's just focus for a minute on the lines that did actually appear OK here on the screen. So the very first line here actually has this thing, which we'll talk a bit more about later. But this is a numeric address. It's an IP address, internet protocol address. And that is my computer's IP address. What that means is that my computer, like almost every computer on the internet, has a unique number associated with it. And it's a number of the form x.y.z.w, where each of those letters is actually a number between 0 and 255, with a few exceptions. Just like a US postal address uniquely identifies your home or your PO box somewhere in the world, so does an IP address uniquely identify computers on the internet. And we'll see you later today. That's actually a bit of a white lie, a bit of an oversimplification, but it'll get the conversation here started. So that address there that I've highlighted is my so-called IP address. And to the right of it are a few measurements, which we'll see in more detail in a moment. Then step two, apparently there's something else. It ends in Harvard.edu. It's core one GW v out something or other Harvard.edu. And then in line three, there's another Harvard something or other. And then we get some stars and then more. What might each of these lines in this program's output represent? Any guesses? Each of these represents some kind of computer, apparently, between me, Harvard, and point b, cnn.com. So they are OK. I just said they're servers. Or I just said they're some kind of computers. They're kind of a servers, but there's a special name given. And you've probably heard this before. Yeah, so it's a router. So you might have heard that there's lots of these things called routers on the internet. You can kind of infer from their name what they do. They route data from point a to point b. And for now, you can think of them as just fairly simple devices that have a really big database, a really big Excel file inside of them that say, when you get data from here, route that data over here. If you instead get data from over here, route that data over here. In short, there's some kind of table or database that says if information is destined for this address, send it that way. If it's destined for another address, send it that way, and so forth. So routers route information. Now there appear to be multiple routers between me and cnn. And that conceptually probably makes sense. Because we don't live next door, so far as I know, to cnn. So it makes sense that there's going to be a bunch of hops, a bunch of routers between point a and b. And what we can see is that apparently the first such hop in line number two is a router that belongs to Harvard. In step three is another router that it belongs to Harvard. Step four is unclear. They're kind of hiding their tracks. But it gets interesting now in step five. Whose router do we appear to be using in step five there? So Quest. So Quest is a big ISP, internet service provider. They're one of the backbone providers, the really big fish out there, that just have lots of bandwidth capacity, lots of routers and such that people like Harvard and even other entities connect into. From step five, we then had to where? Six is kind of unknown, but seven is owned by who? So also by Quest, but what's kind of neat is that the sysadmins who run routers on the internet, they kind of adopted this common convention years ago where they use airport codes to identify router's locations. So go back now to step four or step five. Where is step five? Where is that router probably? So Boston. So BST is actually not our airport code, but you can infer it nonetheless. Feels like Boston. What about step seven? That's an airport code, Newark, New Jersey. So it appears that there is a connection. There's a wire. There's a satellite connection. There's some form of connection between lines five and seven. And that's how data is getting from one router in Boston to another router in Newark, New Jersey. Now admittedly, there's probably another guy in the middle who's masking his identity, step six. But so be it. It still went a few hundred miles pretty darn quickly. Step eight seems to be, oh my god, where? Yeah, so feels like San Jose. Well, let's see. I wonder if there's another San Jose there, unless the data is really going off into nowhere there. So step eight mentions San Jose. Step nine mentions Newark, which makes me a little suspicious that maybe there's a San Jose that's not in California that I'm not aware of, because it feels like we should end up on the same coast here. Because line 10 has us in Newark. Line 11 has us in probably Washington. Now it feels like Washington. And then line 12 feels like Washington. So we seem to be staying here on the east coast. And then finally, we get down to line 15 toward the bottom. Where does CNN server appear to be, or might be? So feels like it might be in Atlanta, Georgia, just based on these names. Now maybe it's a little further away, because they're being a little clandestine down there. But the point is, in just milliseconds, did data get from point A Harvard to point B Atlanta? How many milliseconds? Well, all of these times that I mentioned a moment ago, 28.413 milliseconds, 48.204 milliseconds. That is literally how much time it takes for internet traffic to get from here all the way down to Georgia. Any questions on this output? Well, let's see if we can't reinforce this and make the world seem an even smaller place. Instead of going to CNN.com, I'm going to go to a Japanese version of their site, CNN.co, which stands for company, often, .jp. So let's see how my web browser would connect to the Japanese day's news. So steps one, two, and three seem pretty similar. Then we get a whole bunch. In fact, we made it all the way to step 17. Now let's see what we can't glisten from this. So steps one, two, and three were the same. Step five? Anyone know what nox.org is? You wouldn't. You shouldn't. But northern crossroads. This is a really big peering point, whereby lots of ISPs all kind of connect in the same place so that data can then go out in different directions. So that's what line five is about. Line seven is another northern crossroads. Line eight? Actually, yeah, line eight. Finally, this is not the same San Jose as before. Line eight seems to be where? In California. And I'm inferring that because step nine seems to belong to a company whose name is? OK, 10 gig. Well, there's probably not 10 gigabit ethernet. That's describing how fast their connection is. But asianetcom, something like that. But here's what's interesting. And I realize the font is big. So these lines are getting a little complicated because they're wrapping. But what's interesting about lines nine vis-a-vis ten? Does anything jump out at you? What's that? They take a lot longer. So between nine and 10, in fact, they stay longer. If you look at 11, 12, 13, all of those millisecond values, they've really shot up from the couple dozen milliseconds we saw a moment ago. Why? Did the internet just get slow after San Jose? Distance. Distance? What distance? What's between us and cnn.co.jp, most likely? The Pacific Ocean is between us. So that probably explains the jump from seven milliseconds in line eight and 73 milliseconds in line nine to get across to the West Coast to line 10 when it's, oh my god, 186 milliseconds to get from point A to point B. And that's because point B this time is much farther away. So this is all well and good for really big globalization. But what about you actually connecting your computer to a home network? So you realize that you probably connect to your home network either using wireless or via wired connection. We'll talk about those in a little bit more detail later. But a lot of this same thing happens close to home as well. So even when you are just sitting at your computer and you don't even have to be connected to some sort of arcane server like this for this same sort of thing to happen. And specifically if you open up a web browser on your computer and you type in cnn.com, for example, and you hit enter, what happens? And that is what we're going to be trying to answer for you today. So basically you can imagine that we have to contact this server cnn.com somehow. And David mentioned this idea of IP addresses where every server on the internet has its own IP address. And this is really, really vital for you to understand that IP addresses, there's one pretty much, again, it's sort of a white lie, but there's one for pretty much every computer that exists on the internet today. And this allows communication between the two because you wouldn't know, for example, how to send mail to your parents' house, for example, if you didn't know the address and it's the same sort of thing. The IP address tells these computers and these routers how to be able to transfer data from point A to point B. But as you also probably know, we don't actually go to websites by typing in IP addresses. We haven't had to type in like 203.169.9.33 into our web browser to be able to access that website. In fact, what we see, what we do instead is to go to something like just cnn.com, right? So instead we type in this, just something that's very user-friendly. It's a name rather than an address. So this implies that there must be some sort of system underneath it that converts from these friendly names into these IP addresses. And this system is called the domain name system. And in fact, pretty much all websites that you go to will have their own IP. They're just usually masked from you. You don't have to know what they are, but you can find out what that IP address of any particular website is pretty easily just by, well, there's a variety of tools to do this. And if you are on DOS or even if you have a Mac, you can open up the same sort of arcane window and type in some commands and be able to see the same information. But if we just type in host cnn.com, for example, we see that we get cnn.com has address 157.166.226.26. And there's a variety of other ones here as well. And why might this be? Why would we have more than one address associated with cnn.com? Right, so there's probably, cnn probably has a lot of different computers responding to people that go via their web browser to cnn.com. So when I hit enter on my web browser, it contacts the server, the server that's owned by cnn. And cnn responds by giving me a web page that I'll be able to display on my web browser. But behind the scenes, what you don't see again is what is happening here. This system is that your computer, when you hit enter, it asks a server via this system called DNS. It asks what's called a DNS server, what IP address cnn.com represents. So it is then able to find out, OK, cnn.com has this address. And then it knows, OK, now this is the actual address that I want to be able to contact. And it will go and contact that IP address using this same sort of these hops that you saw via trace route. So another way to think of this domain name system then is like a big address book. So just like when you know that you wanted to look up the address of a particular company because you wanted to send them a letter or something like that, you would open up an address book, or rather an address book would work. But like the large yellow pages or something like that, and you would find that company, you would find their address, and you would be able to know where to send the mail. It's the same sort of thing. It's an analogy here on a computer system to be able to contact each of these machines and find out where we should be sending our data, where we should be sending these requests. Yes? That's a good question. Maybe we should defer slightly later to a really good question. Maybe let's defer for a little bit just so we have a few more building blocks and then come back to that question. It's a good one. Yes, so what I'm talking about here isn't necessarily a filter. It's more of a lookup. So it's more like you want to know the address of a specific company and you're doing a lookup for that company. I suppose you technically could, just like if somebody tore off half of the yellow pages and gave that to you, you'd be able to filter that way. But that's typically not what's done in this same scenario. Typically if you ask a DNS server what IP address is associated with a name then you're able to get that same information back. And one of the things that's particularly interesting is that when you are on your home network and you load up, say, your network settings, and this, you probably have had to do this a couple of times if you ever call technical support and they tell you to load up this window and tell you the information that you see. You notice that there's a variety of things available here, including IP address, subnet mask, router, some other stuff. But what we're talking about are some DNS servers. So we'll talk about some of these other servers that exist as well and some of these other numbers that you will see in a typical window involving these sort of network settings. But in this case, DNS servers are provided to your computer through a variety of means. And a few years ago, your ISP, your internet service provider would typically tell you what to type. They would say, okay, open up your settings window and type in this number into DNS. And this makes sense because we don't know on our computers what CNN.com is. We have to ask somebody else, somebody who might know what CNN.com refers to in terms of its IP address and how are we going to figure that out? We have to have some server somewhere, some computer that knows what each of these IP addresses are. It has a mapping of the CNN.com to its IP address and it is these servers then that we can contact. So we could contact any one of these three servers and ask it that question. We could contact this server 140.247.233.200 and say, oh, hey, I need to know, I need to contact CNN.com. What's its IP address? And the server would then respond with the IP address of that machine. So, yes? Are those DNS servers, are they run by your ISP? So, yes, so typically DNS servers are run by your ISP. In this case, two of these servers seem to be run by Harvard, just based on their own IP addresses. And each of your own ISPs at home will have typically their own DNS server even though there's a whole bunch of DNS servers all over the world that contain sort of the pieces of each of this information. Yes? Your computer is asking the router to ask the DNS server or you could just go to the DNS server? So there's multiple layers that happen here. So when we're talking about IP addresses, for example, we might send some data and we'll target that data to a particular IP address. Now, the message that we're sending might be any number of things like it might be, oh, I want this website, for example, or I want to do a DNS lookup for this domain name. A domain name is just a fancy word for CNN.com, Harvard.edu, just these sorts of names. And so the message then is independent of this system that routes it to the appropriate server. So in this case, our computer knows that we have a couple of servers that it can ask for this information. And then just through this, for right now, this magical power of TCP IP, it will send through a variety of routers that question. I want to know what CNN.com is. And then those servers will respond to my IP address with the answer to that particular question. Now, you'll notice that these DNS servers aren't actually host names themselves. So in other words, it's not like dns1.harvard.edu or something like that. And hopefully the reason is obvious in that in order for TCP IP, for this IP system to work, we have to be able to contact the IP address of this machine. Like, we can't, there's no, we won't know what dns1.harvard.edu means in terms of an address, unless we have another machine to do it. So these DNS servers, you will always see as an IP address. Okay, so let's draw a picture. Since we started out by looking at the big picture, so to speak, the internet and how data gets from point A to point B, Dan's introduced how data actually starts to go from point A to point B in the context of DNS. So let's make things more real now. So you all at home, odds are almost everyone in the room has internet access of some sort. In fact, a quick survey, who has a cable modem at home? Okay, so like 70% and how about DSL? Okay, and how about I don't know? It's okay, all right, excellent, so good. One or two, three outliers, okay. So there's various ways in which you can get on the internet. So you seem to be familiar with these as a consumer. The technologies are a little disparate, but perhaps we'll dive in deeper in just a minute. But for now, suffice it to say, you can pay someone some number of dollars per month. They give you some kind of device that you install on your home, and voila, you plug your computer or your computer's into that device and you're on the internet. Well, what's actually going on and how does it relate to the backbone we looked at a moment ago and DNS which we've just now looked at? So here is my best attempt at drawing a computer. It's a bit retro, but that's what computers used to look like. So this is me or this is you at home. So you have Verizon, you have Comcast or someone similar. What do they hand you when you sign up for service? What is it called? So a modem of some sort. And this word has been around in various forms for years. But for now, cable modem, DSL modem, functionally, they achieve the same goal, even though they work differently. So this device is going to look like this. And sometimes when you get these devices, they're actually combined with this device called a router. So we talked about a router with my little animation a bit ago. But those are sort of big, hunking machines that route lots of data from point A to point B. But routers can also be small, personal devices that live in my home and in your home. And the purpose there is to get data off of the internet and then dispatch it to one or more computers that you own, and then vice versa. So same idea. It's still routing data, but on a smaller, much cheaper scale. So I'm going to also draw a pair of antennas on this thing just to convey the idea that some of you might have cable or DSL modems that are all wrapped up in one device that also provide you with wireless service. But realize some of you might actually have two devices for this for the same purpose. So now my computer is somehow connected to this cable modem. We'll go retro, and I'm going to use a physical ethernet cable for your desktop computer. If it's wireless, it just goes through the air for now to these little antennas. And what is this cable modem? Those of you with cable modems connected to. What did you plug it into? What the person who came to install it? So it's the cable jack. So it's what's called a coaxial connector. It's that sort of retro metal type thing that you just screw or push the prongs on to, and it holds it there. And it's the same kind of jack you use for cable TV because they're using the same physical infrastructure for TV and also for cable modem services. So this goes into a little jack on the wall. If you have a DSL modem by contrast, this isn't a little circular metal plug, but rather what? If you have a DSL modem, it's a telephone jack. So they're using a different infrastructure, but again toward the same end. Now this thing is now connected to some fancy equipment in your basement or the side of your house or the big green box down the street or underneath the sewers, various different places in your neighborhood. But ultimately, it leads to this thing, which we began tonight's discussion about, the so-called internet. And then somewhere else on the internet, this process kind of reverses if it's a home user or if it's a company. They probably have fancier equipment. So if this here is cnn.com's offices somewhere on the internet, whether in the US or Japan, they are also connected to the internet in some way. So when Dan says that your computer, in order to go from here and contact cnn.com, needs to go through DNS, well, what does that mean? Well, when I sit down at this computer, type cnn.com into my browser, hit Enter, my computer has no idea, a priori, where the heck cnn.com physically lives. But thankfully, my computer has been configured by someone to know who does know the answer to that question. And what is the name of the type of server that knows the answer to the question? Where is cnn? So the DNS servers, exactly. So somewhere in here, and we simplified a little too much, somewhere in here is my so-called ISP. So sorry if you're drawing in pen. Somewhere in here, let me interpose my cable modem or my DSL company, and this will be Comcast if you're local here, or Verizon, or someone like that, and they, in turn, are connected to the internet, and I, in turn, am connected to them. Now, inside this big office building, or whatever infrastructure they have, are some computers called DNS servers. Now, these are probably not even big computers. They're not laptop size, but they're certainly not room or refrigerator size. They're just decent-sized computers that a human can pick up these days. They're not huge, even though they do big things. Well, inside of this building are DNS servers. But my computer knows how to contact those servers, because it knows, per Dan's computer, the what of those DNS servers? The IP addresses, exactly. Now, as Dan pointed out in yesteryear, the cable modem company or DSL modem company used to literally give you a piece of paper with some arcane numbers on it, and you or they would type those numbers into your computer and then hit Save, and then your computer knows the IP addresses of those DNS servers. Thankfully, the world's a little more dynamic these days. And in fact, they support a protocol, a language, called DHCP, which even if you don't know what this means just yet, odds are you've seen this acronym somewhere in your computer at some point. It stands for, but this is not interesting for us, Dynamic Host Configuration Protocol. And as an aside, in E1, they'll often be jargon, they'll often be acronyms. It's fine to remember what these things expand to and what they mean, especially if it helps you understand what the concept is. But this is not impressive if you can rattle off what these acronyms actually stand for, but knowing what it is is useful. DHCP, as the D implies, Dynamic, is a language, a protocol that two computers speak, ISPs computers and my personal computer, so that when I boot up my computer, I essentially ask the world, hi, I'm ready to go, tell me what I need to know. The DHCP protocol says Comcast will respond to that plea for help with a handful of numbers, among them DNS server IP addresses. In fact, DHCP will also give me something else. What do I, per my first claim tonight, need to have in order to even talk on the internet? Exactly, an IP address of my own. So DHCP will also assign me an IP address that I can then use for the duration of my session. Now, very often you'll get the same IP address hour after hour, day after day, even from weeks or months on end, but it's not guaranteed. But the fact that it's not guaranteed makes it harder for normal people like us to actually run servers, websites, and email servers in our own homes, because potentially our IP address is a moving target. So once I've hit Enter, cnn.com, my computer, because it's already booted up, queries, this DNS server and says, hm, this user who's typing on my keyboard needs to know the IP address or IP addresses for cnn.com. What are they? Hopefully Verizon or Comcast knows. They respond with the answer. Then my Mac or PC says, OK, I now need to request the day's news from not cnn.com, but from this IP address, 1.2.3.4, or whatever it happens to be. My computer then sends another request out onto the internet. It goes through Comcast or Verizon's infrastructure. It goes out onto the internet. And then what does it bounce between at that point? What's in here? Bring it all, bring it home, tie it all together. Routers, right? So our example right from the get-go tonight had all of these hops, all of these routers between points A and B, where all those points, they're inside of here. So there are these nice, expensive, fast computers called routers in the world that are owned by all sorts of people and companies administered by all sorts of different people. But they're all designed to get data ideally closer to point B from point A and to get them there in 30 or fewer hops, generally. So that when the data finally reaches cnn.com, inside of my original request is casually give me the day's news. And it's more specific than that. There's a cryptic sequence of commands that are sent. But it says, give me the day's news. And then this computer knows actually how to reply to me and only me, because what did I have the foresight to write in the return address, so to speak, of the envelope I just sent across the internet? My own IP address. So much like you would address a personal letter to someone using postal mail with their address on the front and your address in the corner, same idea. It's a virtual envelope containing a request for information, but it's ultimately the same idea. So there's a little bit more information that DHCP will also give you. So yes, it does give you an IP address. So your computer has something so that it can communicate with the outside world. And yes, it also provides DNS servers so that your computer is then able to look up the IP addresses of whatever interesting servers you might want to contact. But another thing that you might notice that DHCP also provides is a router number. And this is the first router that your ISP provides. And this is where your computer will send all of the data and then that router will decide from there what hops to take next. So just like each of these routers in this internet, this sort of nebulous cloud internet that exists right here, just like each of these have a table where they say, okay, all messages destined for this IP address should go this way. And all messages destined for this IP address should go this way. Our computer also knows that because we have one router, we know of one router that exists in our ISP that we should send our data to that router. And from there it is then sent further along the internet. Any questions on this so far? Now one thing that's particularly interesting is that you don't actually have to use DHCP when configuring your computer. And in fact, back in the old days, you might have had to do this manually and type it yourself. Now this isn't something that I would generally recommend doing because imagine what might happen if I just type a random IP address in here and somebody else already owns that IP address. What can happen? Yeah, so a conflict, exactly. So sometimes some computers will actually tell you this IP address is already in use and then you will know then that that's not a good thing. But if we allowed our computer to operate with an IP address that already exists elsewhere, then what can happen? Yeah. Right, so the data could be sent to either one of the computers or to both. And this would just not be a good thing. It's as if we had two houses on a street and they both had the same mailing address so now the postman doesn't know where to send the mail to. He might bring it to one or to the other. Hopefully he would make a smart decision about it. But it's the same sort of thing here. The routers might get confused and not quite sure about where they can send the data. And so this is yet another problem that DHCP solves for us, is that it takes a lot of the guesswork out of having to type in these numbers. We don't have to try to figure out what IP address has been assigned to us and we can just allow smarter computers that already know what IP addresses have already been assigned tell us which one we should actually use. Now we didn't go into a lot of information about IP addresses, but I think it's interesting to make mention of this. David mentioned earlier that IP addresses basically have four sets and are divided by periods. So we have like w dot x dot y dot z and each of these numbers can be in the range from zero to 255. Now does anybody remember the significance of this? Zero to 255? Yeah, it's a byte of data. So we basically have four bytes of data in an IP address and it's just sort of an interesting thing to make note of, that we have four bytes of information that can make up an IP address. Now this might sound like we have a lot of numbers then because we have 256 different possibilities for each one of these values here, w, x, y, and z. But when you add it all up, you actually figure out if you multiply 256 times 256 times 256 times 256 times 256 that tells you how many unique IP addresses we actually have. So in other words, that's just everything in this complete range from zero dot zero dot zero to zero all the way to 255 dot 255 dot 255 dot 255. Now if you multiply all these together, you find out that we only have four billion IP addresses available to us in this space. So that I guess that might sound like a lot at first glance, but who knows what the population of the world currently is? Six billion? Yeah, six billion. So already we have more people than we have IP addresses and this isn't that big of a deal because obviously not everybody has a computer, but in another way it is a big deal because we have these servers. So it's not like we just have one computer that's neatly associated with every person. We have, for example, CNN.com and they have multiple servers that they own. Like we saw when we looked up the host information, just CNN.com, for example, has six different servers that they use. And you also might have more than one computer. You might have a laptop, for example, and maybe a desktop at home. Maybe you have a cell phone that can connect to wifi, for example, and that will also use an IP address. So quickly do we find are we running out of these IP addresses that exist in the world today? And so there's a variety of ways that we're trying to combat this particular problem because experts keep saying that we're going to run out of IP addresses in, I don't know what they say, 2011 or something. They always say something that sounds very, very close to what we are now and it's just going to be sort of a big problem. And so there's a couple of ways that we've been trying to fix this. Now one of the ways is with something called IP version six. We're not gonna go into too much detail about this, but IP version six will actually allow us to have an address that's much, much, much longer than IP address that we've introduced here. So this is just called an IP version four address. It's not, it doesn't matter all that much. This is what is used typically on the internet, but it's not the newest system that's available. And so even though we only have four billion addresses here, we have something completely ridiculous like 20, I don't know, hundreds of billions and millions of servers that can be assigned IP addresses in IP version six. Now this unfortunately is not anything that's going to, or I guess fortunately, depending on how you wanna look at it, isn't anything that's going to come up very quickly. I think this will be slow to adopt. And so we have these other things that we have to do to try to try to solve this particular problem. So let's say then that we have this router and that we have multiple computers associated to it. So we have, let's say, David's desktop and then because I'm slightly higher tech, my laptop. So now we have to use two IP addresses for, or rather one IP address for each of our computer. So we're using a total of two. Now, is there any sort of way that we could try to make this easier? Is there any way that we could just share one IP address to try to use less? Right, so we could have a server in our home that does what? Okay, good. So we might have then a server or maybe even like this router for example that is able to take information from each of these computers and then try to figure out after it sends the information for us into the internet when it gets a reply, it will try to figure out which one this actually came from. So another way to, or a way that we can put this into an analogy is it's a mailing address again. So let's say you have a building that has multiple apartments and each of these apartments doesn't have an external address, right? They all have the same mailing address to that one building but somebody, whenever they get the mail, looks at the name that's on each of these pieces of mail and says, okay, this one goes to this apartment, this one goes to that apartment, this one goes to that apartment. This way we can have more apartments but we don't have to give out more mailing addresses and it's the same sort of thing. So through this other system that's called NAT, this router can actually determine which message this computer sent, for example, and then as soon as it gets a reply from cnn.com, it will route that same information back to the original computer that requested it and so in this way, we can then have one IP address for this specific router or for your router that's sitting at home and then we can assign like sort of a different set of IP addresses. Some IP addresses that aren't made publicly available to the rest of the internet just like you might have apartment A, for example, in this building and apartment B and apartment C and those don't have to be known to the outside world just that one mailing address and then the person that is responsible for sorting the mail is able to take each of these and send it to the appropriate location. So frequently what you will see is that you will have an IP address, especially if you're at home. Now in this case, now that we're on Harvard's campus, this is a public IP address. This is one, I'm using one of these four billion IP addresses in the world but at home when I'm using a home router, typically what you see is something very, very different and you might have an IP address that's in a different range, maybe something that starts with like a 10.1 or what's another famous one or popular one that you might recognize? 192.168. Yeah, 192.168, that's another one that's or of course there's two sets of numbers after this but these are generally known as like these private zones of IP addresses that routers can provide to computers within a house or within a small area and these IP addresses aren't made public to you. So whereas I could contact my computer directly by using this IP address, I may not necessarily be able to do the same with my home computers because I have these two computers and they are sharing one public IP address that my router is remembering for me but the router has also given me these private IP addresses maybe in one of these ranges or maybe in some other range that then tell me that I'm on this private network and not on some larger public network that's accessible to the internet. So before we get to the one security concern which you raised earlier, the Great Firewall of China and the like let's consider a smaller scale one. So how many of you have ever sat down at your home computer typed in a domain name but misspelled it and hit enter? So all of us presumably, but what's happened? Yeah, so you'll often reach some page that's just bogus. What do you often see on these bogus websites? Yeah, like a search box, even maybe some mention of Google even though Google does not own that site but you see what specifically? What do you see on these pages? Not just Google. This spam or really they're just littered with ads, right? These sort of faux looking websites that are just a whole bunch of links to not so useful websites. So how is this happening? So it can be a number of different ways. So for years it's been common practice to buy domain names. So CNN.com is a domain name. Harvard.edu is a domain name and you can buy these things and we'll actually do this potentially later in the term when it comes time to discuss websites but you can buy domain names and you can buy domain names that actually look like other people's domain names in hopes that someone's gonna goof some percent of the time, mistype their domain name and hey, maybe they're on their website they might as well click through an ad and therefore you make a few pennies. This is very common. This is what's known as squatting and it can happen either so that you can grain revenue. Sure. So you can either squatters do this either to just pull in ad revenue or they do this in hopes that you're gonna someone that will take an interest in that domain name and actually buy it from them. So it's gonna be very awkward. You're gonna have to come with me. No, that's okay. We'll do it. So this is actually becoming a bigger concern now because now some of these big ISPs, internet service providers are actually themselves doing this on a much bigger scale. So Dan introduced DNS a bit ago and if DNS is this big overarching system that essentially hears all of the requests you're making for websites and for servers so that it can do something useful. Translate those names, those host names, those domain names mostly synonyms for our purposes tonight into IP addresses. Those same DNS servers can also respond with any answer they so choose. So in fact, if you try to go to a website that doesn't actually exist, well on Harvard's network, I'm gonna go out on a limb here and assume that no one actually bought this. It's getting louder. I think you should get that. If I actually go to a website like this that I just completely made up, it's a complete nonsense. So it happens to say jerk a lot, so whatever. Okay, well actually, so this is good behavior. Harvard is sort of respecting internet standards because when I type in invalid domain name, my browser gets a certain numeric response from Harvard's DNS server that says, that doesn't exist, I got nothing for you. And so your browser displays this, it's somewhat cryptic, but very clearly erroneous page. What's increasingly happening at home if you have Verizon, Comcast or a lot of these big ISPs, they're actually receiving your request for jerk, jerk, jerk, jerk.com saying, oh, this isn't a valid address, but rather than just tell you that and leave it at that, here is a page full of advertisements that we make money off of anytime you click through any one of these links. And this is actually really ticking off, particularly the tech community, because this is actually breaking the behavior of DNS and what the servers are supposed to respond, but why do they do it? All right, so it's money. Now, those of you with Comcast, there is a annoyingly cryptic sequence of steps you can go through to actually opt out of this process. And it may have in fact caused enough of an uproar that ISPs do need to, never lectured this intimately before. The ISPs may very well have to make it an opt out process by now, but those of you who are actually, next time you make a typo at home, take notice of the page you actually get back. And even if you care nothing for the advertisements or don't really care that, oh, don't care to, well, what's interesting now is that the explanation for that symptom you're seeing really reduces to this simple story involving DNS queries. Now, before, I don't know how much longer I can do this. Let's see, before we take a break, let's see if we can't tease apart some of the jargon that up until now we've intentionally not spew now, but now that we have sort of a picture on the board here, let's see if we can toss out some of those pieces of jargon and buzzwords that you'll actually encounter in typical life so that you can now relate things you've seen or heard to what we've been discussing thus far. So if you could, I'll go over here. So let's zoom in on this for just a moment. So this device here, we've called a couple of different things already tonight. So we called it a router. Oh, do you want to be my chalkboard person? We got some space up there. Oh, I can do both, oh, fine. So this thing, you can make a list over there. So this device we've called a router thus far. It's often called a home router and its purpose in life is to take data in and then route it to the destination computers. It also might be called a NAT router, network address translation. That's precisely the technology Dan cited a little bit ago. But if it has these little bunny-ear antennas, it's also what's called an access point or AP. An access point simply describes a piece of hardware whose purpose in life is to provide wireless connectivity to aerial laptops. Now let's dive a little deeper there. An access point generally implements one of a few standards and the numeric standard here is something called 802.11. Have you ever seen those numbers in the dot in between them? Probably seen it on the shrinkwrap box maybe. I mean, it's fairly cryptic at first glance, but this is just the identifier that some people out there came up with to describe a wireless technology. But there's a few different variants of it. The most popular ones these days are probably G and increasingly N and also B. So long story short, each of these letters constitute different versions of this technology, and there are newer and newer versions because the world gets faster and faster and faster at implementing wireless technology. So originally B, which was one of the earliest consumer versions of this technology was 11 megabits per second. And you may recall our discussion of bits a couple of weeks ago. We'll come back to that later tonight and next time. G up the ante so that if you have a wireless network that's operating with the G version, it's 54 megabits. So roughly five times faster. So that was good. And N, anyone know what N is? It's even, I think it's not officially even a standard, but enough of the companies have hopped on this bandwagon now. It is now a standard. Enough, so retract that. It is now a standard and it operates at what speed? Oh no, Dan? Yes. I don't know. Several hundreds though, I think. It's a lot more. It's more and we'll check it during break, but we know it's faster and we know it's better because that's what we have. So now why care about this? Well, it speaks to the actual speeds that you might get with your neuron home, but there's a relationship now here or there's at least a potential bottleneck. You might very well go out and buy the very latest and greatest as we did. This really isn't working and not having a buffer here. All right, so it's like amateur hour tonight. All right, so let's consider this. So you hypothesize that I want a really fast home network. So you go out and spend for G or you go out and spend for N so that you can have 54 megabits or 100 megabits of wireless connectivity. But there is a problem because this wireless device, this access point, is not the only thing sitting between U point A and point BCNN.com and every other website out there. What other devices in between me and the rest of the world still? So there's still the other side of this device, which we earlier called also a cable modem or a DSL modem. Again, there's been this convergence because for cost reasons, convenience reasons, a lot of these buzzwords we're now tossing up there are all kind of one in the same box, but they're different technologies lumped into the same plastic carton. So the cable modem or the DSL modem in here, this too has a maximum speed. So if it's a cable modem, anyone know what your cable modem speeds actually are? Anyone know what your, what are you paying? You're all paying 50, 60 bucks a month. What are you getting for it? What's that? So it can really vary, but 10 is not uncommon these days, 12 megabits down. So 12 million bits per second of a download speed, but there's often asymmetry. So beware some of the hype. Even though you might get 10 or 12 megabits download speed, what do you often get as upload speed? Two, one. Now maybe that's not such a big deal. And in fairness, most of us probably download more data than we actually upload. But frankly, if you use your home internet connection for work a lot, maybe you send around really big PowerPoint presentations or PDFs or Photoshop files, or maybe you're one of these people who likes to share video content or music content for better or for worse, that requires that you actually upload content. So almost all of us have this asymmetry at home whereby you can download content. Even videos from the course website pretty darn fast. But when Dan and Chris and I try to upload videos to the course website for all of you to reach, I mean, we are often there twiddling our thumbs because our internet speeds are relatively lower. So realize that you can't just latch on to the numbers that might be applied to one technology, your own cable. Your own access point speed, your own home network speed, if you're only paying for so much of a limit in this particular device. So there's one other role that this device also commonly plays. And to tie the question, tie this together with the question earlier, it's also a firewall. So what is a firewall? Yeah, it's a means of blocking bad stuff. So viruses, Trojan horses, and we'll talk more about this, especially in our security lectures this semester. And to be fair, it's not quite blocking viruses and Trojans, per se, but it's often blocking in, it's often lower level than that. So simply having a firewall is not necessarily your only defense against getting infected. Because even if you have a firewall, you're probably still capable of receiving emails from people. And viruses and such are often transmitted by way of email. Accidentally, unbeknownst to the sender, but there are things that slip through firewalls. But what firewalls are really good at keeping out is traffic that did not initiate from you behind that firewall. In other words, firewalls, even these fairly dumb cheap ones that come with home routers, that come with cable modems these days, what they essentially do is they allow traffic to go from you out on the internet. And then the response, they allow to come back through the device and come back to you. But they don't let random people on the internet initiate connections from themselves to you unless you have initiated that connection. Now, this is a good thing in principle, but there's increasingly a number of applications that we normal people might like to use, like voice over IP, this technology where you can have phone calls with people over the internet. Well, that's kind of a problem, because if you want someone to be able to call you, that suggests they need to be able to initiate this connection. Or you need to be able to initiate this connection. So very commonly, do problems arise when two parties, A and B, have firewalls between them? But fortunately, there are increasingly some technological workarounds to this. So questions on this home networking aspect of firewalls and such? Yeah? Correct. It generally works on, yes, the TCP IP level, so lower level like that. Yes, keeping out certain IP addresses and port numbers. And more on that in the future for those unfamiliar. Other questions? So just as a teaser, we won't answer this just yet. We'll take a short break. But also out there in this big cloud are not only those routers, but there are also innumerable firewalls here and there. And in fact, if one of the off ramps to this little cloud we know as the internet is not just a building, but in fact, that's my best drawing of China at the moment, is in fact an entire country. I mean, if you are determined enough to filter traffic reaching your end users, you can absolutely interpose some number of devices between you and the rest of this cloud. And so this is what the so-called great firewall of China is all about, is actually filtering all incoming traffic and all outgoing traffic, which per today's even limited discussion is clearly possible, because anyone who has physical access to any of these hops on the internet, anyone who has physical access to the computers that are connected to that device can absolutely look inside of any of the traffic flowing across this wire. But thankfully, there are some defenses. So more on that after the break. Let's take five. Hello, everyone. Welcome back. So luckily, the ball and chain has been cut. And I am a free man. So we can talk about now much more interesting things than me having to stand awkwardly behind and mock David as he speaks. I'm glad nobody pointed out or laughed while I was making bunny ears. Anyway, so we were talking before the break about China that apparently looks like a set of lips or something. It's not to scale. That is connected via this very thin line to the rest of the internet. And this is, of course, a bit of a white lie. Of course, multiple connections between countries and between these major backbones that exist. But we were talking before about how we might have this concept of a firewall or this sort of colloquial term that's been thrown around a lot, the great firewall of China, so to speak, that blocks connections either to the rest of the internet or from the rest of the internet. And we were wondering how this might work. And so to talk about this in a little bit more detail, we should, again, have to go back to talking about the IP addresses itself. So these IP addresses, as you remember, it's this set of four numbers. And you might imagine that we can actually assign entire sets of IP addresses either to geographic locations or maybe to institutions or to companies. Whatever might make sort of logical sense. And in fact, you can imagine that, for example, a number that starts with like 18. Whatever, whatever, whatever, whatever might have some sort of meaning or some representation. And this is actually, it's a complicated subject because there's a lot of ranges of IP addresses and it's very difficult to tell. But there's only a few companies or countries that own what's called a class A. And a class A IP address range is basically any IP address that exists within a specific number. So like that 18.whatever, whatever, whatever, whatever number, for example, that's a class A address just because it has one specific number, one known value before it. Now the class isn't really all that important, but the main idea here is that we can have these ranges of IP addresses that can be associated with either geographical locations or to companies as a whole. And you're not necessarily tied to geographic locations based on IP address. Now there are ways that you might be able to figure out the basic area that an IP address is coming from if you know all of the IP addresses and all of the geography that's associated with them. But in general, that's not necessarily true. But for an entire country, you might be able to say, okay, an IP address in the range 19.240 or something like that, just making that number up might exist, for example, in China. And so China then might have all of these large servers that might exist right at the edge of these very large connections that exist between them and the outside world. And these servers might look at these requests that are going back and forth from other computers in the country to computers out in the internet. So for example, cnn.com or the other way around. And it might be able to say, okay, well, I have decided that an IP address in this range is blocked and therefore I'm not going to allow this particular connection to happen. And there's been a lot of speculation about whether or not this is actually true. But anecdotally, I was in China in January and it's very, very true. They block a lot of websites. And for example, Facebook, they do not allow you access to Facebook. So even though you can use via your computer, you can contact the DNS server and find out what the IP address of Facebook is. What happens when you actually try to contact that computer with that IP address is that there's these large servers somewhere in China actually say, I don't think so. And they cut it off. And then you see that same message that you saw on David's computer before where it said, cannot connect to this server because there might be some problem with it. And so you might imagine that there are ways around this problem and what might be a way that we can get around this sort of firewall problem. I'm very lucky I didn't get caught, by the way, but I did do this myself. Statically putting the IP address in. Statically putting the IP address in. Okay, so I'll address that one first and then I'll come back to VPN. So generally that won't actually work if you just wildly change your IP address to some other value. Most of the time the router that you're first connected to won't recognize that IP address is being in the acceptable range that should be connected to it, if that makes sense. So a router might know, let's see if I bring this back up. A router might know, for example, you notice that this router in particular is 140.247.40.1. And so in combination with the subnet mask where we're not really gonna get into that too much, your computer is then able to say, okay well all of these numbers within a range, and in this case 40.1 to about 40.255 can be connected to that router. But any number outside of that range it's not going to understand that IP address. It won't be able to contact back and forth. So unfortunately that may not work and it could still detect, even if it's from a different IP address, it could still detect because it still has to go through these servers. So there was another idea that we could use, say, VPN. And this might actually work. The idea, the concept of a VPN, is that it actually gives you very basically an IP address that's somewhere else. So let's say that I have a computer at home and what I want to do is make my computer look like it's operating from Harvard's network. And this might be useful for a variety of reasons. So let's say that Harvard has some internal things that will only work with other computers on Harvard's network or even for companies, they might have internal servers only that will only work by communicating with other machines within that same network. A VPN allows you to connect and act as though you are on that computer even though or rather on that network, even if you're not geographically in the same location. So whereas before here, now my computer is connected to this router at home and it's connected to Comcast or to Verizon. What I can do is I can create a VPN, what's called a VPN tunnel. And it's still going through this same physical connection, but it's as if my computer now has a 140.247 address and it's as if I am on Harvard's network. So usually, and because this tunnel is encrypted, which means it's like that connection that you have with a banking website, for example, so it's encrypted and so somebody that tries to read the information won't be able to, usually this means that you can VPN into another computer and you might be able to then go to that website, that blocked website that didn't work before. Now sometimes it's not always going to be true. VPN, it doesn't always have to work that way, but in general, that will certainly be the case, yes. Yes, so VPN in order to act is though it were coming from this network, you must be connected to the internet. Otherwise there's going to be no connection to be able to act on, if that makes sense. So like if this didn't exist, then there's no way that we could connect to Harvard servers and request that we be on that particular network. So that connection would also be dropped. So you must have an internet connection in order for VPN to work. Now there's another way that this could work as well, this same sort of concept, and it's this idea of having what's called a proxy. And that is to have another computer or another server somewhere on the internet act on your behalf. So you could, and it's similar in spirit, but it's very different in reality from a VPN server where through VPN you're sending all of your requests through the internet, then to some server at Harvard, and then that server is sending out that data on your behalf. But with the proxy what happens is that you send the data directly to that computer and then it just forwards that information. And sometimes you'll be able to get through firewalls if you are able to encrypt this information from your computer over to the proxy. Question. So she already figured out how to circumvent these protections. Okay, okay, so for the camera, a lot of schools, a lot of libraries, this kind of thing, we'll often filter net internet traffic. But as Dan said, there exists these things called proxies. Some proxies are web-based, which means you go to something.com. They have a little text box there that you type then cnn.com into. Here we go again. That you would then type cnn.com into. All right, good. As luck would have it. So what essentially you submit the address of the website you want to visit at school, they then go out on the internet, get the contents from facebook.com. They then relay it back to you. And so they are literally a proxy in the conventional sense. They are doing something on your behalf and serving as this sort of middleman. All it would take for the school administrators to stop that would be to find out the name of this proxy service. And there's even more sophisticated ways of getting around it. The VPN solution is fundamentally much more sophisticated, much more robust because it literally is encrypting everything between A and B. And so long as you trust the person you're VPNing into, like Harvard, not to disclose what it is you're doing and what websites you're visiting, that's definitely a robust approach. But even then, China or other entities, companies could in theory filter out VPN traffic because all internet traffic has unique numbers associated with the type of traffic. But even then, if they try doing that, you can also spoof your kind of internet traffic and pretend that your VPN connection is actually just an email going across the internet. So it's really a cat and mouse game, and it requires much more sophistication to circumvent those protections. But it will say anecdotally, we just got an email from someone on the internet recently who was trying to bless their hearts to tune in to computerscience1.tv, where we have the course's videos from years past, when we've apparently been blocked. In China, if I don't know if I told you. So you can not tune into this course in China at the moment. But she was actually asking how to circumvent that. And it's hard to offer legal advice over email. But so we didn't really give a solution. But Dan, just actually until just now, you now have three ways. Other questions? Other questions? OK, so there's not just a question. Just encryption happening at this VPN level. There's also encryption that might be happening closer to home. So you can also protect yourself a little closer to home. And this is something you do have control over. So just a few years ago, you could go into your own home. You could go into a friend's home, sit down at your laptop. And when you go to connect to local wireless networks, odds are you'd have like a smorgasbord of options to choose from. And you would usually choose the one that seemed to have the strongest strength. These days, you often see what kind of icon next to all of those so-called access points. So you see a padlock of sorts, which suggests that the connection is not locked, per se, but it's encrypted. It's scrambled. To encrypt information means to scramble it using fairly fancy mathematics that are generally, but not always hard to decrypt, unless you are the designer of that particular setup. So in the world of wireless networking, back to this device here, for years, the standard was to use a protocol called WEP, W-E-P, wired equivalent privacy. The irony was it was completely broken. And don't even bother writing this down, unless you're writing, do not use WEP. Because it was completely buggy and easy, it turns out, to crack. In fact, you can if you go into a home that's still using WEP to encrypt their wireless traffic. If you sit there long enough on your own laptop and have the right piece of software installed on your computer and listen to enough of the wireless traffic that's just flowing through the air, you can, after some number of minutes generally, figure out what the secret password is that that person is using for their home network, type it into your own computer, and voila, you're now on their network. It was really kind of a joke. So the world came up with a couple of new things, among them WPA, or newer variant WPA2. So this is better. And in fact, if you take one thing away from tonight security-wise, it's that if you bought several years ago your own home router and you remember or you have the user manual with which to figure out how your wireless settings are configured, if you discover in looking at your configuration page and to tie this into something Dan said, odds are the IP address of your home router is something like 192.168.1.1. If you've ever seen that number somewhere, odds are you can type that into a web browser in your home network and then pull up the configuration page for your router. Anyhow, if you are resourceful enough to figure out how to check your router's configuration and you find that it only supports web and you're moderately paranoid, it's time to go spend another $20, $30, and buy a newer one that supports WPA or WPA2. Thankfully, though, if you do have a relatively newer router or access point that supports the newer protocol, so long as you choose a password that's reasonably long, eight characters, 10, 20, whatever, and not just 1, 2, 3, 4, 5, or something silly like that, your traffic is pretty well encrypted, which means if you are sitting there at home communicating with the outside world, your traffic, at least between you and this middleman device, the router, will be encrypted. But with that said, if all you're doing is visiting Facebook, generally, Facebook's traffic is not encrypted. So the moment your traffic reaches your home router and then goes out that cable modem or DSL modem and then through those 30 or fewer hops on the internet, odds are almost all of the traffic at that point in the story is not, in fact, encrypted. So while no snooping neighbor or person in your household can figure out what you're doing, there's frankly millions of other potential people on the internet who could, but the upside is most of them don't really care what little old you is doing on your computer. So you have sort of an anonymity protection there. Questions on WEP or WPA? No? All right, so one last detail. But one thing we haven't focused in on, silly though it is, is this thing here. What is it that you're connecting this cable to on your own computer? What's that called? A NIC, so NIC, N-I-C, Network Interface Card, otherwise known as an Ethernet card, otherwise known as a network card and probably some other things. This is simply one of those green cards that I think Dan held up a couple of lectures ago on camera, a little logic board whose purpose in life is to empower a computer to actually communicate on a network. Into that network card or Ethernet card goes, not surprisingly, an Ethernet cable. And the other end of that cable goes into what device? Just to make sure we're following along. Yeah, so the home router, which might just be the cable modem if you own one computer, might just be the DSL modem if you own one computer, but if you've got a bunch of computers or a bunch of family members in your home, odds are you have a device with multiple Ethernet jacks or your own personal router or switch that you bought that you connected into that device? And so can you connect multiple computers to it? And if you want one tech support troubleshooting tip here, here's a curious thing. Many cable modems and many DSL modems remember which Ethernet card was connected to them most recently. Every Ethernet card has a unique address. It's not an IP address. It's a different kind of serial number. It's actually a bit longer. But it's a unique address that identifies that Ethernet card. When you plug your computer into a cable modem or a DSL modem, that device tends to latch onto that address for whatever reason. And it just remembers it. And the funny thing is, and this is not an uncommon problem to run into in your own home when you're troubleshooting something, if you then unplug your computer and just plug a friend's computer because they want to use your internet connection and you didn't splurge and buy a home router or anything like that, it might not work. Even though technologically, it should. It's not Comcast or Verizon who cares whose computer is connected, but that device has just remembered who was connected most recently. So the solution to this problem, if you know, is to do what? Reboot it. Reboot it. So there's this buzz phrase, power cycle something, means to literally unplug its power connection from the wall. Usually wait five seconds, maybe 30. It's usually arbitrary until people like us, until you get bored, plug it back in. And then let it boot up. A bunch of lights will start blinking. And odds are, if you're ever having a weird problem like that and it's involved, you're swapping computers or installing new networking devices in your home, odds are you just need to power cycle the device. Worst case, you have to call someone they need to send a special signal. It's pretty rare that that should be the case. Yeah? Isn't that called MAC address? A MAC address, yes. An ethernet address is synonymous with something called MAC address, which is not Macintosh. It's media access control, so it's all caps. So that's another acronym you might see somewhere in your computer's innards. It's actually sort of a pet peeve that people tend to talk about Macs, like the computers in all caps, even though it's something completely different. They're talking about this media access control, this ethernet ID. So make sure you get your capitalization right or you will be made fun of. By Dan, yes. Or the power cord from the device itself. Yep, because these things usually don't have on-off switches. When they're plugged in, they're on. So yeah. And honestly, we probably shouldn't just admit this, given that we preach all this fancy, sophisticated stuff in this color, 9 times out of 10, the solution to your problem is to reboot your computer. So irrespective of all the stuff we say, reboot your computer and your problems will sometimes often be solved. Other questions? Sorry, Tis, but we usually wait to the last lecture to tell you that, but sorry. So just to make this ethernet ID more concrete, you can just see an example, like this one is mine, for example. And so you don't really have to remember it. And usually each one, or it always is the case that each one is unique to a computer. So not only do you have an ethernet ID for your wireless adapter, but you might also have one for the wired connection that you have here. And this wired connection, actually, there's some interesting things also about this. So there's a variety of speeds that exist for this today. And not only does it depend on the cable that exists between your computer and the device, but also how fast your computer and your router can actually support. So sometimes you might see different values, like ethernet, for example, which is able to support 10 megabits per second, or fast ethernet, which can do 100 megabits per second, or we never looked up in. Oh, right, I'm on it. Or even now these days there's gigabit speeds, which is 1,000 megabit speeds. And nowadays, finally, are we seeing for a long time computers had gigabit capabilities in these ethernet ports, but for a long time the routers did not. And so nowadays we're finally seeing routers more affordably able to have gigabit speed connections. And so you might remember that this cable modem doesn't quite have that same sort of speed. We're talking about 12 megabits, or even nowadays, like the fastest you will typically see with Verizon's BIOS, or now Comcast even around here is starting to go up to 50 megabits per second. But we have these really, really fast speeds that can exist between a computer and a router. And what's the point? Why might we care that we can have a really fast connection between the computer and a router? Yeah, right, exactly. So to communicate a file, for example, from this desktop to this laptop, if they're both connected using gigabit speeds, or using gigabit ethernet, the transferring that file will be really, really fast, because it doesn't have to go through the internet, obviously, it can just go through this router. And as long as your computer supports that speed and the cables you're using support it, and the router as well, you'll be able to get a very fast connection between the two. And so this can actually make a difference when you're trying to decide why on earth would I want to use a wired connection versus wireless. So even ignoring the problems with security, for example, where anybody would be able to, if you're using an unencrypted network, anybody would be able to see what traffic you're sending back and forth. But now you could be able to transfer files between your computers very, very quickly, very, very easily, and with the way that we're transferring, like for example, iTunes rentals, even like you download a movie on iTunes, and you're trying to share it from one computer to another, it can be a painful experience unless you're using a very, very fast connection. Are you able to find out the speed? Yes, 802.11.n can go up to 600 megabits per second on an optimal local network. So all of these speeds that we quoted you for the wireless, those are sort of the theoretical maximum, almost never see that maximum speed coming out of these. I mean, even if you were literally sitting right next to the router, will you not get that speed? But it certainly is an indicator of the relative speeds between the two. So it's sort of a big leap from G to N, for example, and you'll notice a very big leap in speed from one, excuse me, from one of these protocols to the other. And it's a good chance for a little caveat and tour here. I pulled up comcast.net here, comcast.com. I went to their shop page and then their products page, because I actually just went through this myself recently. So they actually have lots of different tiers of internet service these days. There's a lot of hype here. I mean, most of this level of detail is irrelevant. What's, especially after exiting this class, the kinds of numbers you should care about is what are the download speeds in terms of megabits per second? I don't want to know how many sound files I can download based on someone's random approximation. You want to actually look at the hard data. And so for this performance special offer, which is an intro price, this is only for new customers, internet service has not gotten that cheap. It looks like, if you can see the small text, what do you get for this intro offer in terms of speeds? 15 megabits. 15 megabits down. So notice capital M is mega. Little b is bit, not bytes. So that makes a difference by a factor of 8. So megabits per second. And then notice the asymmetry uploads by contrast or only three megabits up. Now fast forward to the bottom of this page. And you'll see that they have this new level of service called extreme 50. And the irony is I actually just signed up for this myself, but not at all because I want 50 megabits down. There is not a website out there, including my own servers here at Harvard, that I can download content at 50 megabits per second. That's really damn fast. And most web servers, even for popular websites that let you download movies and videos and all of this kind of stuff, even then they're not gonna give little old you 50 megabits of performance between them and you. Imagine if they did that for you, can you even fathom how much bandwidth, how much speed they need to be able to provide in total to everyone? So why would anyone besides me pay for 50 megabits down or pay for this particular plan? In fact, pushback on me, why would even I pay for this if I just said there's no way I'll ever need that much? Sorry? For me, for work purposes, it's just the upload speed. So yes, I'm doubling my download speed. Frankly, I don't really care, but my upload speed will soon be 10 megabits upload speed. And even that's probably more than most sites can sustain. But because much of what we do is moving big video files around and moving big PowerPoints and other types of research type files around, that alone makes it worthwhile. Now those of you who are just you in your apartment but maybe have roommates or family members, then maybe these numbers like 50 megabits down starts to become a bit more compelling because if there's five of you there actually downloading at once and you're all downloading some movie or doing something that's bandwidth intensive, bandwidth is to be clear is the measure of bits that can flow between two points. A and B throughput is a synonym for this concept. Then when you actually have multiple users, it might actually make more of a difference. So don't feel that you need to pay for so much more service if it's not even clear to you how much of it you're using at the moment. So there's one other issue that's so germane to many of us in the room now, these kinds of devices that actually with which you can get on the internet as well. And Dan actually took out a moment ago a little thing called a dongle, sort of a generic term for little thing you plug into your computer that similarly allows his laptop to get onto the network by way of what company? Well, this one's Verizon. Just wanted to involve you in the discussion. Yeah, thanks. So how do these technologies actually work or what are these technologies called with which you can get on the internet on these mobile devices? Because it's obviously not a cable modem and it's not a DSL modem. It seems to be different. 3G? Yeah, so 3G, which years ago was kind of promised to us as this mecca of internet service in a mobile fashion really kind of sucks, frankly. Any of you who have a mobile device in your pocket, it's kind of nice to be able to check your mail and whatnot, but it is not a particularly pleasant or blazing experience as of yet. But so what network are these devices actually using to get you on the internet? Yeah, so just the, well, not, what was the word? Not Wi-Fi. So Wi-Fi is actually the consumer term for these things. 802.11, Wi-Fi. That's how folks kind of simplified it for most people. But that's the same thing there. So it's the cellular network, so the mobile networks. They actually now have the infrastructure to transmit data in addition to voice because it turns out almost all of our cell phones today are not analog devices that send little waveforms and thus you get static on, but they're actually digital themselves and send zeros and ones. All right. So, questions. Anything at all? Open floor. All right, so let's round out our discussion of DNS and then we'll conclude with a look at a fun video. Oh, thank you. So, I could go anywhere, couldn't I? No. So we started today's discussion looking at the backbone of the internet and then we quickly segue to this discussion of DNS, which again converts host names, to IP addresses, and vice versa. But we never quite asked or answered the question, how do you even go about getting those domain names? So harvard.edu has been around for some time and cnn.com has been around for some time. But what if little old you or little old me wants to get our own domain name? Just how hard is it? Frankly, within five minutes, we could probably have one. So there's this kind of stupid name to website, but it's the biggest internet registrar called godaddy.com. They're actually known for having kind of crazy ads during the Super Bowl and spending their money accordingly, but they're quite big as a result. Also, as you may find, if you buy your own domain name at some point, they try to upsell you at every point in the process. So the irony is that buying a domain name is actually as simple as buying a book from Amazon, except they throw so many different options and features and jargon at you. It feels like it's a lot more difficult than it is. But there are various agencies in the world that manage the DNS system. So there has to be someone coordinating all of this because there are some rules over who can buy the various TLDs, top level domains. And by that, I mean the.com, the.net, the.gov, the.org, the.edu, the.something, those are TLDs. Now, who can buy .coms from what you know? Companies, but in fact these days, anyone. Even I could go out and buy for just apparently 199, a.com address. Anyone can buy a.org address. Anyone can buy a.net address. Can anyone buy a.gov address? So no, the U.S. kind of put the stake in the ground long ago and controls the.gov, TLD, .edu. Similarly, it's overseen by an organization that does not allow just anyone to get a.edu address. But there's more than just .com. This is a website again that I can just type in something like whatever I did before, jerk, jerk, jerk, jerk, jerk.com. And I could buy that because apparently, or at least, it appears to be available based on the fact that my lookup for it earlier failed. But there could be other reasons for that. I don't quite remember what to type. But for now, let's just focus on this. It looks like GoDaddy sells not only .coms but .info, .me, .mobi, .us, .biz, .mx. What's interesting is that there's a smattering of different types of domains in this list. Some of these are the conventional .com, .net, .orgs that you're familiar with. But what strikes you as a weird one? One at a time. So .me. So .me, actually, which country does that belong to? OK, so it's actually a good segue. Let's do this right now. So .me sounds interesting. You'd think, oh, that's for David.me or Dan.me, if you want to have your personal website. But I'm actually going to search for .me, TLD. This is going to bring us to Wikipedia. And it actually is the country code TLD, a special type of TLD that's supposed to be owned by a country for Montenegro. So this is a country that has apparently decided that, presumably, because a lot of the English-speaking world knows the word me to mean me, people might like to pay us $1.99 or 1099 for that particular domain name. So they essentially have opened up what's supposed to be originally their citizens, TLD, for their own websites, companies run within the country, to anyone in the world. This has happened with the small island nation of Tuvalu, whose TLD is what? Anyone know? .tv. We pay them $39.99 a year to have computer science 1.tv. Why? Just sounded kind of cool. And so we splurged, instead of spending $10, which is the typical price for a .com. But long story short, if you have the money to afford the $1.99, $10.99, $39.99 domain name, you go to a website like Godaddy. You fill out the form. You add the thing to your cart. And you check out. And then you simply tell the so-called registrar what server in the world is going to physically host your website, where your file is going to be, where your image is going to be. So you have to tell them to tell the rest of the world where to find you. And that ultimately leads to the DNS system. Because once you've told them where in the world your server lives, that's how they then know how to answer the question of the form, where is computer science 1.tv? They check their database. They check what you told them. And then they tell the rest of the world what your server's numeric IP address is. Questions? Yeah? Do all DNS servers have the same information? Good question. Do all DNS servers have the same information? So no. So there's actually a hierarchy to the domain name system for performance reasons that we've kind of simplified over. We earlier said that there's really just DNS servers here in Comcast or Verizon. It turns out they don't know the answer to all questions. But there are bigger DNS servers than just theirs. So if they ever don't know the answer to a question, they go ask another DNS server that they've been configured to talk to. And ultimately, the buck does stop somewhere. There are these sort of super root level domain name system servers that know where all the .coms are, where all the .orgs are. And so that information eventually trickles down. But for performance reasons, and this is a recurring theme in this course and in computing in general, servers like Comcast and Verizon's will quote, unquote, cache those answers, C-A-C-H-E, to actually remember the answers for a few minutes, for a few hours, maybe a few days so that they don't have to waste time and waste bandwidth going and re-asking someone the same question again and again. And so even your own computer sometimes caches those responses, so you don't have to constantly go back to Comcast with questions. Yeah? Uh-huh? Is it server? It depends. So for the camera, who server do you use to host a website? What do you tell GoDaddy? No, it wouldn't be Yahoo or Google. In addition to buying a domain name, which comes with a yearly cost, a renewal cost, you would also typically own your own server on your own network, your own company or campus, or you pay someone else to rent space on someone else's server. And there are dozens, hundreds of companies that provide web hosting services. You would tell GoDaddy what that hosting company's IP address is. So it's independent of websites like Google and Yahoo that we're familiar with. Shall we roll the film? All right, so let's see if we can't conclude with the visualization of what's going on on the internet. I would say that this film, this animation, developed by some folks at Ericsson's Media Lab years ago, is mostly accurate. But they took some artistic liberties, shall we say. So let me go ahead and pull this up. Do you want to dim the lights, huh? All right. All right. And I give you Warriors of the Net and iTunes together. It's about 10 minutes long. When we talked about TCPIP earlier, there's other protocols. One of them is UDP that serves a slightly different purpose. When we were using TraceRoute earlier, we were sending what are called pings to each of those intermediate hops, not pings of death, though. Different type. For the first time in history, people and machinery are working together, realizing a dream. A uniting force that knows no geographical boundaries without regard to race, creed, or color. Communication truly brings people together. This is the dawn of the net. What works? Click here to begin your journey into the net. Exactly what happened when you clicked on that link. You started a flow of information. This information travels down into your own personal narrative when Mr. IP packages it, labels it, and sends it on its way. Each packet is limited in its size. The mailroom must decide how to divide the information and how to package it. Now, the package needs a label containing important information, such as sender's address, receiver's address, and the type of packet it is. This particular packet is going out onto the internet. It also gets an address for the proxy server, which has a special function, as we'll see later. The packet is now launched onto your local area network, or LAN. This network is used to connect all the local computers, routers, printers, et cetera, for information exchange within the visible walls of the building. The LAN is a pretty uncontrolled place, and unfortunately, accidents can happen. In all times of information, these are IP packets, no-cell packets, Apple Talk packets. They're going against traffic, as usual. The local router reads the address, and if necessary, lifts the packet onto another network. The router, a symbol of control in a seemingly disorganized world. Sorry about that. This one here, this one here. This one here, this one here. I don't like this. This one here, this one here. I don't like this. Conservative, and sometimes not quite up to speed. Sorry. But at least he is exact. For the most part. Okay. Okay. Okay. Packets leave the router. They make their way into the corporate internet, and head for the router switch. Even more efficient than the router, the router switch plays fast and loose with IP packets, definitely routing them along the way. But they still pinball wizard, if you will. Packets arrive at their destination. They're picked up by the network interface, ready to be sent to the next level. In this case, the proxy. The proxy is used by many companies as sort of a middleman in order to lessen the load on their internet connection, and for security reasons as well. As you can see, the packets are all of various sizes, depending upon their content, and looks for the web address or URL. Depending upon whether the address is acceptable, the packet is set onto the internet. Some addresses which do not meet with the approval of the proxy, that is to say corporate or management guidelines. These are similarly dealt with. We'll have no of that. For those who make it, it's on the run again. Next up, serves two purposes. It prevents some rather nasty things from the internet from coming into the intranet. And it can also prevent sensitive corporate information from being sent out onto the internet. Once through the firewall, a runner picks up the packet and places it onto a much narrower road, or bandwidth as we say. Obviously, the road is not broad enough to take them all. Now you might wonder what happens to all those packets which don't make it along the way. Well, when Mr. IP doesn't receive an acknowledgement that a packet has been received in due time, he simply sends a replacement packet. We are now ready to enter the world of the internet. A spider web of interconnected networks which span our entire globe. Here, routers and switches establish links between networks. Now the net is an entirely different environment than you'll find within the protected walls of your land. Out here, see it's the wild west. Plenty of space, plenty of opportunities, plenty of things to explore and places to go. Thanks to very little control and regulation, new ideas find fertile soil to push the envelope of their possibilities. But because of this freedom, certain dangers also lurk. You'll never know when you'll meet the dread king of death. A special version of a normal request ping with some idiot thought out to mess up unsuspecting hosts. The pass our packets take may be via satellite, telephone lines, wireless, or even trans-oceanic cable. They don't always take the fastest or shortest routes possible, but they will get there eventually. Maybe that's why it's sometimes called the worldwide wait. But when everything is working smoothly, you can circumvent the globe five times over at the drop of a hat, literally, and offer the cost of a local call or less. Near the end of our destination, we'll find another firewall. Depending upon your perspective as a data packet, the firewall could be a bastion of security or a dreaded adversary. It all depends on which side you're on and what your intentions are. The firewall is designed to let in only those packets that meet its criteria. This firewall is operating on ports 80 and 25. All attempts to enter through other ports are closed for business. Port 25 is used for mail packets. Port 80 is the entrance for packets from the internet to the web server. Inside the firewall, packets are screened more thoroughly. Some packets make it easily through customs. While others look just a bit dubious. The firewall officer is not easily fooled, such as when this pink of death packet tries to disguise itself as a normal Pino packet. It's okay, it's okay, no problem. Have a nice day. We are here. Bye. For those packets lucky enough to make it this far, the journey is almost over. It's just a line from the interface to be taken up into the web server. Nowadays, a web server can run on many things. From a mainframe to a webcam to the computer on your desk. Why not your refrigerator? With a proper setup, you can find out if you have the makings for chicken catchatory or if you have to go shopping. Remember, this is the dawn of the net. Almost anything's possible. One by one, the packets are received, opened, information may contain. That is, your request for information is sent onto the web server application. Packet itself is recycled, ready to be used again and filled with your requested information. At rest, as sent out on its way back to you and past the firewall, routers and all through to the internet through your corporate firewall interface. Ready to supply your web browser with the information you requested. That is, this film, pleased with their efforts and trusting in a better world. Our trusted data packets run off blissfully into the sunset on the day. Knowing fully, they have served the masses well. Isn't that a happy ending? That last part's the false part. So anyhow, you have seen tonight that in just 200 milliseconds, we can be across the world in Japan. So we'll leave you with that. Till next week, see you then.