 All right. Welcome officially to Computer Science E1. This is lecture for the internet. So it's in tonight's lecture and next week where we begin to take for granted that computer hardware exists and that we understand a little bit about it from the past few weeks, that there's software running on top of that hardware as sort of hinted at by last week's movie night. And then tonight we'll start to focus on the internet and what it means to be a network, what the internet actually is, how it works, how it breaks, and some of the interesting juicy details underneath it all, again, all toward this end of understanding what's going on underneath the hood. So with that said, we thought we'd begin with building from the ground up. So the internet's in your own definition. As you understand it now, what is it? Unfortunately, yes, that is what we and others have taught. So a bunch of tubes and more concretely, perhaps. OK, so fiber optic cables, certainly a component that you perhaps have heard is involved. But if you had to summarize it, and if you were, for instance, all of you, even though you're in computer science E1, probably know more about computers than, say, a friend or your parents or a grandparent or someone of that sort. So if they asked you what the internet is, how would you explain it to them given that you use it probably every day? Network of networks. Good. So it's a network of networks. OK, so being a little feisty, your grandmother now pushes back and says, well, what's a network? Yeah. OK, good. So a network is a bunch of computers somehow linked together. Am I about to get interrupted? No? No? I have not said anything inaccurate yet. All right, so I'll draw the old school form of a computer, a monitor on top of desktop, although clearly times are changing. So the simplest type of network we might have might be two computers connected by some kind of cable. And back in the day, this would be a peer-to-peer network in the truest sense of it, both computers being peers connected physically by some cable. And this is back in the day, 80s and 90s, when if you actually owned a couple computers or you and your dad, in my case back then, or you and a colleague, you had to get some kind of cable that linked the two physically together. Well, this is only so useful because very quickly did the world realize that there are rooms full of computers that would be nice to connect together. So let's actually erase this line and ask our cell. You're just hovering here. I am hovering. You should be holding the eraser. So let's take this wire away, introduce a third computer to the mix, and assume that these machines are now representative of, say, an office building or a home full of computers, and ask ourselves what it would take to actually network these things together. If I may interject now. Clearly you're about to. Go ahead. The way it used to work was that you could actually connect these two computers, and if you had a third one that you had to connect to, you would actually connect another cable from one of these computers. So essentially, it would just create this large ring. But what are some of the advantages or perhaps disadvantages of this sort of connection? Speed, OK. Well, yeah, speed, certainly. But there was something else I heard. Something that had to do with the cables. Well, sure, you have a lot more, OK? Because normally you only need one cable when you connect a computer to a network. But now in this case, each computer needs two. But what if something else happens? Let's say this cable gets chewed or destroyed in some way, cut. For example, what would happen to the network? Just go down. Everything, no computer would be able to communicate it with each other. Maybe this one would be able to communicate with this one, but it perhaps depends on that. So is there a better way that we could do this? Well, OK. Wireless, yes. Fast forward to this century. Cutting cables and the like. But is there a better way to do it with wires? Have them all connected to a center point, OK? Good. And this picks up where Rye left off. So as our alternative, especially in the homes these days, you're not linking your computers from one to the other because that, in addition to these potential gotchas, is also sort of a pain, right? Even if you imagine an individual computer lab, say in Harvard University, you've got a few dozen computers lined up, well, you could sort of daisy chain them all together. But it's kind of a headache if, for instance, you want to move things around. The cable length becomes an issue if you want to connect different rooms together and so forth. And just this idea of having one long snake of a networking cable that somehow links everything together doesn't really suggest much built-in redundancy. And so what the world has rather migrated to is this topology, this setup of wiring everything to one central point. In fact, those of you who have desktop computers in your home or in your offices, what's this central device typically call to which you do wire all of these devices? So it could be a server, but we're going to go a little lower level than that today. So a server is just another name for a computer that happens to serve up data to clients, as we'll call them. But let's talk, if you're familiar, at the networking level. So these are networking cables, otherwise known as a fiber optic if you have a really fancy home network. But for the most part, they'll be called, yeah, so cat five or cat six, or more generally, ethernet cables. And any of you who have a physical computer at home, like a cable modem, DSL modem, it probably came with a gray or a blue cable with a fat phone jack-like thing. That phone jack, if you're really curious, is called an RJ45 connector. And all that means is that just describes the type of connector. It's a fat phone jack. And that's what goes into the back of your computer and into your cable modem, your DSL modem. But that's all fine and good if you've got one computer. But if this represents the internet and this represents, say, your cable modem from Comcast, we've got a problem if we've got three computers in the home, whether laptops or desktops, because we want to somehow link them together. And how many jacks, ethernet jacks, ports on a cable modem are there typically? Yeah, just one. In fact, early on, only five, 10 years ago, when you did have cable modems from Comcast or Cable Vision and RCN, they actually really resisted the idea of you having multiple computers in your home on the network at once. They would prefer that you do what back in the day? Pay them twice as much to have twice as many computers. Honestly, they did not make this easy and certainly didn't encourage it. I wouldn't be surprised if some of the fine print and the service agreements way back when said you can't do this. But for various reasons, probably not the least of which was that it's kind of a hard force to stop technologically, because the world was inventing these home routers that were making it possible to share this connection. Well, you can buy a device like this. And this could be one of these home routers. But more properly, if we want to be technical, it's something called a, anyone know? A hub could be a hub or more likely these days something called a switch. So a switch is just a little metal box that has an electrical cord that plugs into the wall. And then it's got multiple ethernet jacks on the back of it or the front of it. And the idea is that you use this as a central aggregation point. You've got a whole bunch of computers. You want to connect them together. You don't try wiring them all together. You instead plug it into this central point. Now there is one gotcha or downside of this approach versus stands from a moment ago. What might you argue is a, sorry? So speed, how so? Interesting, so if we're now marshaling all of our information through one central point, kind of suggests bottleneck, right? If all this data is sort of arriving and colliding at once, there is a potential gotcha there. And in fact, when hubs were the devices of choice just a few years ago, that was a problem because you would actually get lots of collisions, literally, zeros and ones hitting each other on the wire, which is generally a bad thing. Switches, though, are actually a little smarter. And the neat thing about switches is that you can think of them if they have a whole bunch of jacks in the back. Each of those jacks has its own wire, and they each have two wires, all of which are crisscrossed in such a way that between every pair of ports there's a dedicated path. So in fact, if you buy a switch that supports a certain speed, you can actually get that speed between all possible pairs of computers, not just in total. I can interrupt really quick just to explain a little bit more the difference between a hub and a switch. So a hub, so if you have, let's say, two computers, or rather, let's do four computers. If you have four computers here, and let's say this computer, we'll just call it A, tries to send a message, or you can think of it as whatever you would like, but just it wants to send some message to this computer B. What it does is, since all of these computers are connected to the hub just once, A has to send its message where? To the hub, right? So the hub gets A's message, and then it resends it to everything on the network. So it sends A's message back to A, A's message to B, A's message to C, and A's message all the way to D. So you can imagine that, what would happen if A and B, for example, both tried to send a message at the same time? What would happen? Right, this is what is a collision. So the hub would either receive two things at once and realize that it can't send out both things at once, and so it would have to tell A and B to just wait just a few milliseconds, or nanoseconds, before it'll try to resend. But this is a really dumb device, as you can see. There's no reason why C and D should see A's message if it only is intended for this other computer. In fact, what's one of the implications of that? If the data is, in fact, being spit out to all computers, A through D. Security. So even here, back in the day on Harvard's campus, when the dorms were wired with these high tech devices at the time, hubs, a student, a malicious student, who might turn a blind eye to the school's administrative policies, could run what's called a packet sniffer, which is a program that literally sniffs traffic, zeros and ones on the wire. And because hubs just sent everyone's data to everyone, you could pretty much literally see what your roommate was typing on their screen next door, what emails were going out, and so forth. It was not all that hard. The only thing really standing in the way was the student's own sort of technological savvy. And so a switch, then, is a little bit smarter, because now it will actually, you can think of it like how David explained it before, where it has, perhaps, a direct connection from each of these computers to the other, or you can think of it as the switch being sort of a mini computer that analyzes the message to try to determine where it goes. And so that's exactly what it does. When the switch receives a message from a destined for B, it won't send it everywhere. It will only send that message to B. And so this not only increases security, but it also reduces collisions to some greatly reduced factor. And it also, therefore, will increase the speed of all of these switched computers, all of these messages being sent from between these computers. And in fact, now let's try to put on the so-called engineering hat. So these topologies look pretty much the same structurally. You've got a central device, and then four things plugging into them. And yet, as Dan said, they sort of work fundamentally different. This one being very dumb, just spitting out the data on all ports and hoping whoever it's destined for hears it. Whereas this guy is smarter. But again, let's sort of think about how this might work. If you are plugging in still a very cheap device. These days, $10, $20 free. If you just ask a friend who no longer needs theirs. Well, you connect four computers, unknown previously to each other. And you expect that when A wants to send traffic to B, say it's an email you're trying to send to your mom who's in the next room all within your same home. Well, how does A know, or rather, how does the switch know where A, B, C, and D actually are? How could we implement that? Yeah, so one trick is that every computer on the internet, this is sort of one of the big lessons if you're unfamiliar today, is that every computer on the internet, small white lie, has an IP address. So an internet protocol address, which is just a fancy way of saying every computer has sort of like its own postal address. So just like one Oxford Street Cambridge Mass 02138 identifies the Harvard Science Center sort of uniquely in the world. Or your address uniquely identifies that little metal or plastic box on your door in which mail is supposed to go. So does an IP address in spirit identify a computer uniquely on the internet. And these IP addresses are of the form w.x.y.z. And then each of these placeholders between the periods is a number between what range, if you know? Yeah, so 0 and 255. Not something you need to or should know. But at least if you've seen these kinds of numbers now in your computer. And once we pull the screen down again later, I'll tell you on a Mac or PC where these values are. Well, this constitutes your IP address. So for instance, this is really useful. Well, actually, if we can nail this home, what are some common IP addresses that you see when you're setting up your new fancy wireless router, for example? Yes? 10, or 1.9. Yeah, very good. So 192.168. let's say 1.100, for example. Another one that you'll commonly see is 10.1. I don't know, 3.4, for example. So these are all common internal IP addresses. And we'll talk more about those in just a little bit. But if you've ever seen some number that looks like this, or if you've ever had to call tech support and they ask you for your IP address and you relay this information, this is what we are referring to, is the same IP address that uniquely identifies your computer on your network. So to tie these stories together, why is it useful then for computers to have these numeric IDs? Well, you can kind of clump them together. So in fact, even though these are commonly used privately in the home, for instance, almost all of the computers on Harvard's physical campus are IPs of the address 140.247.something.something. Why.z in this case? And because the world has sort of standardized on Harvard and the Cambridge Mass Campus, for the most part, having computers that start with that prefix of numbers, these things known as routers on the internet, which we'll come back to a little more next time as well, are these devices, servers, that take data in on the internet and route it closer to their destination, ideally. They can use these numeric prefixes, kind of like lookups in an Excel spreadsheet, and find out, does this packet go this way, or does it go that way? And it can answer that question just by looking at the first few numbers. And because it has a table that says go left with this number, go right with this number, essentially. But we'll tease that apart even more next week. So let's apply this now. So in your home, each of your computers, assuming they have internet access by way of your cable modem, have an IP address. But it turns out that each of your computers has another type of address. So in addition to an IP address, a computer also has another kind of address. Yeah, this thing called a MAC address. MAC, not to be confused with Macintosh. In this context, it means media access control. It's otherwise known as an ethernet address. So long story short, there's this duality of addresses on most any computer on the internet, whereby your physical ethernet card, so the little logic board inside your computer or the chip on your motherboard that implements all of the networking capability, it's called an ethernet card. It has a unique serial number, 1, 2, 3, 4, 5, 6, whatever. But it does follow a pattern. But your computer also, conceptually on a higher level, has this thing called an IP address. And whereas an ethernet address is hard-coded, burned into the hardware, and thus is supposed to remain constant throughout the life of your computer, an IP address can change. And the nice thing about this is because if every computer has a MAC address, ethernet address, that's sort of a fixed address that uniquely identifies it on a local network with other computers. But because IP addresses aren't fixed for the life of your computer, they can change. This is one of the reasons you can take your laptop to Harvard and have it work, to your home and have it work. Because on a high level, data goes from this network to another network using these numeric addresses. But as soon as you sort of zoom in on the switch in your closet or your basement or your attic or wherever you keep it, it's these ethernet addresses that are used within local networks. So for the most part, when your computers are talking in your home, they are still kind of using IP addresses. But more important, when the computers are physically connected by way of one of these switches or hubs or home routers, is that their MAC addresses, their ethernet addresses are used. The computer asks, hey, I'm computer A. I want to send computer to data B, or rather data to computer B. Computer A has to know the IP address, it turns out, of computer B. But what computer A essentially does is it does broadcast to the whole world, hey, I'm looking for computer B. What is your MAC address? Computer B says, oh, that's my IP address. My MAC address is 1234567. A hears that response and then with that tidbit of information, what can he do henceforth? Just intuitively, perhaps. He can now send data not to everyone, but to that specific address. And the reason that a switch is smarter than a hub is because switches memorize these MAC addresses as they flow in and out. And they remember, oh, I saw this particular MAC address, 123456 from this physical port, from this particular computer. So you know what, any time anyone A, C, or D sends data to that ethernet address, I'm not going to stupidly broadcast it to everyone on the network. I'm just going to send it out the physical port on which I saw that address earlier, whereas a hub has no such recollection of those details. So if I can nail this just a little bit better with perhaps a more specific example, so. Better? Yeah, better. Clearly. Anyway, so David was talking about how if you have a computer on Harvard's campus, you most likely will have an IP address somewhere in the range of 140.247.y.c. So let's say, for example, that you closed your laptop, went down to MIT, and opened your laptop, and got an IP address there. It actually would not be the same IP address. MIT has its own range, 18.x.y.c. And there's, of course, x, y, and z will each be a number in the range of 0 to 255. And so your IP address will change depending on perhaps your location, or more specifically which network you are connected to. And so what doesn't change, though, even though your IP address has changed, what doesn't change is your MAC address. And so if I can just elaborate a little bit more on that. When we talk about MAC address, we're talking about all capitals, M-A-C, MAC address. And very frequently on the interwebs, on blogs, and such, you see many people referring to Macintosh computers as MACs, capital M, capital A, capital C. That's not right, they're confusing two different things. So if you're talking about the computers, it's capital M, lowercase ac. If you're talking about the specific Ethernet address, it's all capitals. So just to place yourself, you can place yourself at a sort of higher level than all of these other people that now just capitalize everything if you want to do that. So I'm gonna try to scare with just a tiny bit of math now that we keep drawing each of these IP addresses on here. So kind of to tie together our first couple of lectures on hardware with this one. Just to sort of demonstrate that this stuff sort of doesn't go away. We're just gonna keep building on it. If every IP address is of the form w.x.y.z, and each of those placeholders can be a number from 0 to 255, how many bits does that imply each of these placeholders is using? If you can go from 0 to 255, or 1 to 256. Sorry? Eight. Eight, right? So if you think way back three weeks, if you do like, all right, let's see, 2 to the 8 is 256. So that means with 8 bits, you can represent any of 256 values. If not, just take it for granted tonight that to represent 0 to 255, you need 8 bits of expressiveness. So that's 8 bits plus 8 plus 8 plus 8. That's 32 bits. So IP addresses are 32 bits long. Seems pretty big, because what did we say 2 to the 32 is? That is how many possible IP addresses are there in the world if they're 32 bits long. What's that? Way more. A little less, a little less you've overbid. Roughly four billion addresses. So that's a lot. But as we've all read, there's sort of an, is it though? I mean, how many people that are in the world? That's a lot. I don't have four billion of anything. Seriously, how many people are there in the world? Somebody should know. 6.3 billion people. So we clearly don't have enough IP addresses for every person. Big problem for one laptop per child, right? So what's the implication of this? So this once upon a time was a huge number, especially when it was just a few geeks on university campuses using the internet, and there were plenty of addresses to go around. But I mean, even think of the growth rates of certain countries, even China. And in fact, there was an article just recently on this popular news site called Slash Dot, the title of which was China to run out of IP addresses in 830 days. So given their growth rate of their population and the percentage of those people that are actually on the internet, well, this is a prediction within just a few years they'll run out of the IP addresses that the world has said, China, you can use these addresses. And among the discussion here are comments like these. China is running out of IP addresses unless it makes to the switch to something called IP version six. According to the China Internet Network Information Center, under the current allocation speed, China's IP version four addresses, modern addresses, resources can only meet the demand of 830 more days if no proper measures are taken by then. New Chinese netizens will not be able to gain normal access to the internet. So again, if we can ask you to put sort of your engineering hats on, how would you solve this problem if you could? If you could address that particular concern. What would you do to solve that problem? All right, so add two more octets and octet, just what do you mean? Yeah, so a couple more placeholders, right? So if 32 bits is clearly too small, well, we just need more bits. And so the world has actually been planning this for years. The IP version six specification five, never kind of saw daylight, is uses 128 bits of addresses. And I can't quite do this one in my head. 2 to the 128, we'll say is huge. 3.4 times 10 to the 38. 3.4 times 10 to the 30th? 38. So even that, I don't really have words to express. That's a lot. So a lot. So is that a solution, do you think? So yes, perhaps that seems to give us quite a bit of buffer time, assuming our own growth rate doesn't go completely through the roof here. And even then, that's a lot of humans. So that's good. We found the upper bound then on the number of IP addresses we need. But now push back on this. It's one thing to sort of pontificate in the basement of a computer building as to how you can fix this. What's an obvious gotcha? Right, so you kind of had this problem where how are you going to get the four billion other people who are sitting in front of their computers to sort of switch to a fundamentally different addressing scheme? So this is one of the reasons this is taking so many years to sort of happen is because of the non-triviality of actually changing the fundamental operations of the internet. And unfortunately, some devices these days sort of fancier routers and whatnot can support both types of addresses. So it will be gradual. And the world does seem to have a way of figuring out how to deal with these sort of apocalyptic messages, even if it's as tame as we're going to run out of IP addresses. But the solution here is just to have... So Wikipedia has actually given us a helpful tidbit of information that will help you... That's a lot of computers. Put this into sort of concrete terms. So for every person in the world, IP version 6 will allow 5 times 10 to the 28th addresses. So that's 5 with 28 zeros after it. So you could have a lot of iPhones or a lot of your computers just sitting around doing nothing. Yes. So that's a good point. So why don't we reuse IP addresses, like we reuse phone numbers? So remember that one of the things that we mentioned is that you can actually change IP addresses even with the same computer. So you can move your laptop from Harvard over to MIT and you'll have to get a different IP address. So most setups usually have ranges of IP addresses that are given to a particular university or to some particular location. So for example, entire continents have ranges of IP addresses that are devoted to them, for example, and they can only give out IP addresses in those ranges. Similarly, MIT can only give out IP addresses in the range 18.x.y.z, for example. And so what we're running out of isn't perhaps the fact that every single one of these IP addresses are being used at one particular time. It's that we are now giving out so many ranges of IP addresses that we are running out of IP addresses to give. So that's really what's happening in sort of simplified terms. Okay, so to summarize, with respect to connecting machines physically, we have a couple different devices. Switches are really what people use these days. Though the thing is when you buy one of those home routers, so to speak, or the wireless routers, these are these all-in-one devices these days. Not only is it a router, it can take data from inside your home and send it out on the Internet and vice versa. It's also a firewall, which keeps bad stuff out and good stuff in, more on that in our security lectures. And it's also a switch if it has multiple ports so that you can connect multiple devices, your Tivo, your desktop computers, maybe even your laptop if it's not wireless, all into one thing. So even though we can sort of distinguish the roles played by these pieces of hardware, there's often everything but the kitchen sink thrown into those devices today. So if we transition back to our original diagram, let's now assume this is a switch, but it's also a router, it's also a firewall, and it's also wireless. It's going to be called an access point or AP. This is what you would yourself buy for $20 plus at a store or online these days for your own home. For those of you who have wireless routers, just to tie this together, to reality, like what are the brands you probably have? Linksys, Netgear, D-Link, Apple, Belkin. So there's a lot of consumer brands out there and go to Best Buy and you see a few of them and plenty more online. Well, we now have a device like that and I'll add a little bunny ears to suggest that it's also a wireless adapter so that we can similarly have, say, someone's laptop connected wirelessly here. But we've now got to connect those computers out to the rest of the internet. So how many of you, just to get a sense of the data, have cable modems at home? Okay, all right, so about half and DSL modems? Okay, so just a few and we won't ask who's sort of not on the net just yet but we'll assume you all are for now. So somehow we have to connect this device to this device which might be a cable modem or a DSL modem. More on those next week probably. Cable modem uses the coaxial cables coming in from your cable company. DSL modems use? Yeah, your phone jacks from Verizon or whomever so you pay different people for the service but you literally run an ethernet cable, same type of cable to a special port on your home router. Usually there's what's called a WAN port. So one of these jacks will be labeled WAN. Not that it's important but what does that stand for? Yeah, wide area network. So that's the port that connects you to the wider world. So everything else on the internet whereas each of these smaller ports to which the other three computers are called are called LAN ports, local area network. There's no formal definition of these things other than to say that a local network is like the computers in your home and a WAN is like the computers in your neighborhood or the computers in your continent. It's really context dependent. So we now have this device wired up here and we've got this device wired up to our ISP, internet service provider, like cable modem or DSL modem. So where do the IP addresses come into play? So who gives you an IP address? Yeah, so your ISP is responsible for giving you an IP address. So Comcast owns several thousand, maybe a few million IP addresses that they're responsible for managing. When your cable modem powers up and it talks to Comcast's physical network, one of the first things it does is it says, hey, I'm alive, give me an IP address. And it assigns it an IP address using a protocol, a language called DHCP. So we'll also see this perhaps on a computer screen in either tonight or tomorrow. But dynamic host configuration protocol, not so much important to remember what it stands for, but this is just a protocol, a language that computers speak when one wants to get automatically an IP address from the other. And this is in contrast to just a few years ago where some of you, when the Comcast guy came or you received your instructions in the mail, you might have had to manually configure your computer to use a specific IP address that was on some poorly printed piece of paper. Kind of a nuisance because it also makes it hard for the cable company to change things, for them to troubleshoot things perhaps. And so these days, pretty much everyone uses this protocol, this language, so that it all happens automatically without the human having to care what his or her IP address is for the most part. Dynamic host configuration protocol. So this is a really nice protocol because this is the thing that will allow you to move, for example, from Harvard's campus to MIT and get a different IP address depending on the campus that you run that you are on. So we can continue with our sort of mailing address analogy where DHCP, there's some higher power that, for example, the postal service that tells you your address when you move. So for example, when you move to a new location, you ask what your address is and you are given an address and that is the address that is used for other people to mail things to you or the address that you use as a return address when you mail things out. And similarly, you ask or your cable modem asks or your computer asks this central switch what the IP address should be and some IP address will be chosen and therefore given to you. Where this breaks down is that you can actually get a different IP address. Your cable DSL modem can actually obtain a different IP address every 30 days or so because it does actually have to ask the server if it's still using an IP address that's okay. It just wants to make sure that there's no two people that are using the same IP address but still it's a pretty good analogy I think from that standpoint. So Dan mentioned these IP addresses that many of you sound to be familiar with, 192.168. So it turns out that in as much as every computer on the internet has to have an IP address, kind of a problem if Comcast is only willing to give you one IP address and you're only willing to pay for one IP address. So one of the neatest things that these home routers do, again this box that we've drawn in the middle here that does everything is among the things it does is it allows you to assign fake IP addresses, internal IP addresses as Dan called them to each of your home computers usually of the form just by convention 192.168.something.something and then what it does is it takes on the identity of the cable modems assigned IP address. So Comcast sends down an IP address and my router, my home router takes on that IP address as his own but then that router gives all of my own computers in my home these unique IP addresses that look completely different probably from that external public IP address and as you might imagine just intuitively this guy's purpose in life in addition to all that other stuff is to take trap, if I make a request for like cnn.com my computer here and it goes out on the internet well, my home router sees it is coming from 192.168.1.100 well he converts that by rewriting the data, the bits that are going out with his IP address sends the request out to cnn cnn sees the request coming from my cable modem cnn replies with the days web page for the news, that web page reaches the home router and what does the router probably have to do then it has to sort of reverse that process so he's got some kind of ram inside of him where he remembers to whom he sent a request and from whom it was sent so that he can then rewrite the IP address again on the way in so that when he then spits it out his port it then comes back directly to my computer and this is what's generally known if you ever see the acronym as something called NAT network address translation which the acronym itself kind of says what's going on there this is a fancy technology this network address translation and we have two computers that are connected to a cable modem so it can either be connected through a switch or some home router or something like that this cable modem is given one IP address from the ISP let's say it's 24.x.y.z for example and now let's say that without this network address translation that both of these computers were therefore given this same IP address that would be the problem in this sort of scenario sorry? that's what the computer does it realizes that there's another computer on the same network that has the same IP address but what does this mean it helps think of it in terms of this analogy that we've been going back and forth with right, well that's what happens it blocks it from accessing the internet and that's because if some packet of information if some message needs to come from the internet to one of these computers and they both have the same address this is the same IP address it's just sort of like what would happen if the post office had two different houses that had the same mailing address it wouldn't know where to send this envelope that has the information or this little packet of information so this won't work so two computers on the internet cannot have the same IP address so this is why, well one of the many reasons why we are running out of IP addresses is that we have the same one to multiple people but this is also where the power of NAT really shines is because now rather than we can only use one IP address and the, and our router or switch will be able to act as this translation device that can convert that one IP address into multiple private or internal IP addresses and so there's at least one piece of the puzzle we can now fill in because many of you, some even here tonight are just desktops and this is certainly in vogue these days so how now if you don't have a physical wire does all the same technology work well when you connect to a network whether here at Harvard or at home what's like the very first step you have to do the very first time you arrive on some wireless network you have to connect to the network but what does that mean like where do you go, whether you have a Mac or PC what do you do physically so you're asked sometimes for a password but first and foremost right if you go to like a starbox hotspot where they have free or they have pay as you go internet access how do you get on that wireless network like usually there's some kind of menu maybe at the bottom right of your screen or up near the top right of the screen in a Mac where you have to pick the wireless network and the names of networks are generally known as SSIDs which is just a fancy way of saying like a pretty unique name for that network so at Harvard those of you who have been on Harvard's network what's the SSID of Harvard's wireless network yeah so Harvard University pretty sensible how about at home what do any of you have interesting names for your wireless routers that you're willing to share what I always see is Linksys everybody has Linksys it's very popular everyone seems to pick this name for some reason okay so all these wireless routers if you're going to use them wirelessly have some kind of name and that's so that when you're in an apartment building neighborhood multiple people can own these devices and you can connect to different ones based on their name so this same thing in the middle here which now has antennas and serves as an access point an AP allows laptops to connect by providing one their SSID but two and sometimes an additional password so for years and this is not all that long ago you would buy these Linksys devices which is just a very popular consumer company now owned by Cisco blue routers typically they would be preconfigured with some default settings like the name of this router will be Linksys and that alone is not such a bad thing but they would also be they would also ship without any form of passwords and they wouldn't make it terribly easy to enable passwords on these routers plus for years it wasn't very easy on definitely windows to type in these passwords because you kind of had to jump through some hoops so as a result you was sort of a newbie buy this router, plug in it at home it does work out of the box pretty much all the time and then you sit down at your laptop you go through the list of wireless networks that are nearby and choose the SSID called in this case Linksys and it just works but what's maybe a downside there yeah works for that kid next door or anyone who happens to be driving by or anyone proximal to your home or to your office and maybe you don't care maybe you don't mind if someone's leaching off of your internet access but you should have some concerns that will come back to in our security lectures like well maybe they just like to read your emails and your instant messages which are not encrypted typically maybe they want to download movies illegally I mean we've certainly most of us probably read articles about some little old lady who's getting sued by the IRA for um for downloading illegal music even though it was maybe her nephew or someone in the home or just someone who happened to have access to their network who live next door so these gotchas as well so I mean you can think of an access point as sort of a wireless hub and it sort of needs to be because when it's sending data out to a computer it can't focus the beam to the exact location of the computer it has to send it out everywhere so now we get this sort of same problem wirelessly that we had with wired hubs in terms of security where not only does your computer receive the message but so does everybody else that's near the access point whether or not frankly they're connected to it and if I can say what David was talking about how you may want to consider password protecting your access points so that for example Joe Schmo over here can't just read your email that's being sent back and forth realize that using a password doesn't necessarily mean that it's safe we'll go over it more in the security lecture but now it's very easy to crack some of these passwords that exist by default on the wireless access points so be careful when it's sort of a teaser for what we'll talk about more in the security lecture. Yeah even I was kind of blown away by a day a couple years ago now where one of my college friends was he and I were helping him move his brother into a local apartment in Cambridge and my friend being a bit of a geek and sort of knowing how to do these things and I really just knowing often that they can be done had this wonderful demonstration on cracking wireless passwords because he had his Mac laptop in his brother's new apartment and he wanted to check his mail or I think we wanted actually literally to look up a movie listing and so it's very easy to say your friend did all this too right so my friend had this special software on his Mac and I forget what's the name well we'll come back to that in our security lecture. This popular Mac program that literally you click start and if you let it run for enough seconds or enough minutes and there's enough other people nearby using the wireless network and the sending lots of bits across the wire and that's lots of encrypted packets well the protocol that a lot of routers still use is called web which is in a word broken and so simply by gathering several hundred thousand or millions of packets which don't take terribly long to flow across the air was my friend literally able to have this software say the password for the wireless access point is dot dot dot he typed it in we found our movie listing and I don't necessarily condone this we just wanted to go to the movies and we you know headed out to the square but it wasn't that hard and what's sort of compelling about this is that even though yes my friend could have his has the sophistication to have written software like this it was a lot easier for him and all of the so-called script kiddies out there people who don't really have skills but do know how to download software that other people wrote that has made it so easy for even the kid next door the adult next door to do malicious things because other people have provided the tools and have made the software available on the internet but you do at least have one recourse most almost all routers these days also support something better than web called WPA so if you have a home router and your laptops and your router are say within a couple years old because a few years before that you might run into compatibility problems but if they're reasonably new year or two well you want to find this setting if it exists on your router and usually you go to like a special web page like HTTP colon slash slash 192.168.1.1 usually your manual will tell you where to go and you can turn this feature on and you'll choose a password the longer the better and so long as it's long enough then you're pretty comfortable you should be comfortable doing things like online banking and sending emails and instant messages even within a wireless network at your home it's tricky WPA and it's even better brother WPA too although they are a bit more secure they do they are actually because they are more secure require more processing power to function so if you have a somewhat aging laptop you may notice that it struggles a bit more so to speak on one of these newer secure networks such as WPA or WPA2 compared to these less secure networks that are WEP or even that have no protection at all so that's yet another thing to take into account however if you have a relatively new laptop you really shouldn't worry about it and if you want to secure your network the clear choice is one of the WPAs over WEP do not use WEP unless one of your laptops requires it because it is a certain number of years old and it does not actually support WPA so to summarize you've got a desktop computer a laptop or maybe several of them because you live with some roommates or family you want to get everything on the internet well you probably call up Comcast or Verizon or someone else to get DSL or get a cable the guy will come and install that thing and it will have literally an Ethernet cable and a jack waiting for you to plug something in if you want to plug multiple computers in you're going to need to get a home router from any of the companies we rattled off you plug buy it for $20 maybe a few more dollars you then wire up your desktop computers with identical Ethernet cables to its ports assuming you have enough if you've got any wireless laptops you're going to have to configure this thing so that you know what the SSID is and or the password ideally follow the manual that comes with the thing but usually it'll be via a web-based interface and you do that by first using a desktop computer or temporarily connecting your laptop with a wire to the router because otherwise you have a chicken and the egg problem how can you configure the wireless so you have a physical cable for that and then you can remove this thing ultimately and if you ever have friends or family come over and they too want to hop on your wireless network and it's password protected with ideally WPA you just have to tell them that password is relatively low risk of that assuming you're not worried about them trying to sniff your emails and other things that aren't encrypted that you might be doing within the home so not all that much too at lots of options but for the most part a lot of it's just marketing these days all the different brands and models although there are differences in speeds which we'll probably come back to later tonight or next week should we take a break? okay welcome back everyone so before the break we were talking about all sorts of stuff IP addresses giving you all sorts of new acronyms NATS and DHCP and all of this sort of stuff and before we begin throwing even more acronyms at you I just wanted to show you a comic from a favorite web comic of mine called XKCD where he actually the creator actually made a map of the internet so to speak in the IP version for space and so if we are to zoom in you can see this is from 2006 it's a couple years old now but it's pretty accurate if we were to zoom in this is referring to all of the class A addresses and what I mean by that are all of the addresses so if we have W.X .Y.Z and then IP address when we are talking about class A addresses we're talking about this first octet this first number or W we're talking about how all of these IP addresses are assigned so you can either assign you can start assigning at the class A or you can assign subnets including this X or even further down to this Y but this is just the map of the class A addresses so we can see a couple of big companies here so for example number 9 is IBM so IBM owns all of the IP addresses in the range Y.X.Y.Z or 9 did I say 9 or Y 9.X.Y.Z and you can see a couple of other things here so just a quick plug for MIT up there at 18 MIT is the only educational institution that owns the entire class A space even Harvard doesn't have its own class A space and so we have some other interesting things here you can see cable is at 24 for example let's see what are some interesting ones HP has 15 Ford has 19 what's particularly interesting is that when you see entire cities devoted to entire cities or continents devoted to it so you can see that portion of Europe has a little blip down here at 62 you can see that there's some specific areas for continents let's see where is so here's North America down here Japan has 136 97 98 99 Japan has 137 a bunch of interesting things and if you were to read the tool tip which is the thing that comes up when your mouse hovers over the image it actually says that for the IP version 6 map imagine Windows XP default screen remember it's just that big green have not yet been assigned this was two years ago and so many of them may actually have been filled out so we're talking about how in order to do everything on the internet or in order to contact one server on the internet you have to know or you have to be able to know its IP address but when we go to a website we don't type in the IP address what do we type in URL what is a URL what does it stand for right universal resource locator and so there must be some way that we then transform these URLs into these numbers that the computer understands so how might this work oh is that your handoff yeah that's my handoff I'll punch you so how might that work so you go to, you pull up a web browser and you type in cnn.com which is clearly not a numeric address it's named in the first half of lecture that every computer on the internet has a numeric IP address which uniquely identifies it so it turns out that there's this other technology which is what really makes the internet much more user friendly known as the domain name system or DNS so when you connect your computer up to an ISP or a cable modem or a DSL modem and that cable modem gets an IP address turns out it gets a few other things as well automatically via that protocol called DHCP so that protocol not only asks Comcast for your IP address it also asks for the IP addresses of one or more DNS servers domain name system servers and these servers' purpose in life is to translate host names or domain names like cnn.com to take a guess IP addresses and vice versa so they literally translate one to the other now why is that a useful thing this seems to add some unnecessary complexity now we need yet more technology more databases doing a mapping from one thing to another well why is this advantageous yeah absolutely I mean try remembering cnn. well we can't call it that cnn visit us on the web at 1.27.36.2 this is kind of how the world began with telephone numbers but the world quickly realized that it's useful to map the letters a b and c to the 2 button so you can call 1-800-collect instead of 1-800-2 and then whatever the rest of it is so this mapping of human friendly addresses to computer friendly addresses IP addresses is useful and again we'll come back to this next week when we talk lower level about routers and IP addresses and tracing the routes thereof IP addresses are useful for computers because as we hinted earlier it's very easy for a computer to look at the first few digits of a number and say oh this data belongs that way but it's much easier for a human to pull up user friendly names like cnn.com and harvard.edu and it's these servers that exist generally with your ISP or in the case of campus within harvard's network that automatically thanks to your operating systems supporting this whole translation in the first place that get asked when you type in cnn.com your browser your browser then asks the operating system open up an internet connection to cnn.com well your operating system windows or macOS does its thing lower level and says to the local DNS server comcast or harvard hey what's the IP address of cnn.com that server ideally replies with the address and what your computer what your operating system does on the browser's behalf it sends out like a little digital envelope containing a request for a web page and that request goes out on the internet with what in the two field of the envelope well that's unnecessarily confusing no it's not it shows you exactly what sort of happens in the background it goes out on the internet with one of these addresses so a dent pulled up is an unnecessarily complicated chart mapping what cnn.com to its four IP addresses it seems so probably for redundancy purposes for scalability purposes because cnn is so unpopular looks like they have multiple IP addresses mapping to the same name cnn.com that's not uncommon and so this is actually the database that's stored in a DNS server that we're seeing here that teaches the DNS server what this mapping is so again my operating system now knows the IP address it puts the request for give me cnn.com's daily news so to speak into that virtual envelope and then stamps the IP address of cnn.com in the two field and just conceptually what address probably goes in the return field corner of that digital envelope yeah my IP address and if I'm within a home network it's going to be my fake address for the moment 192.168 but the moment that envelope gets sent out my home router it quickly gets rewritten as my public address goes out comes back with the day's news the router quickly fixes it puts the address back to the 192.168 address send it to my computer and voila I have the day's news and just to give you a hint as to where we're going later in the course I'll start writing your own web pages well when I say that my browser requests the day's news what does that mean well let's be more precise it means my browser's requesting cnn's home page well what does that mean well generally a home page is implemented that is it stored in a text file a very simple file called often by convention index.html so your teaser for tonight is html hypertext markup language or one of the languages in which web pages are written and though it would probably look like greek to most of us tonight it will by terms n be much more accessible this is just a text file called this and so literally when I say that your browser asks for the day's news or the course's home page what it's sending in this virtual envelope so to speak is literally this special command get and then it's saying what do you want to get well I want to get slash index.html and then it will usually have some other information like the version of the language it's trying to use because whereas html is the language in which web pages are written what do you think is the language just based on your own personal experience that web browsers and web servers use to talk to one another sorry a binary in some sense but higher level here what do you always type into your browser http so that thing you've been typing sort of wrote for years now http colon slash slash is actually a hint to your browser as to what language it should use to talk to whatever you type after the slash slash say cnn.com so this little snippet here get slash index.html dot dot dot is an example of an http message meaning hyper text transfer protocol and that's just a fancy way of saying that's the language the standard that web browsers and web servers use to communicate and they use that language to exchange yet another language in the form of like a piece of paper with code on it called html and we as a class will learn html later in the semester and actually make web pages so what Dan just pulled up here is the result of essentially sniffing his own web traffic with a special program so that when you pulled up in this case cnn.com normally you'd see the days news appear in safari or firefox or whatever but what this is showing us is what's going on behind the scenes what's being sent in that so called virtual envelope or digital envelope and it's actually a lot of information but these are hints these are standards that web browser and server are using to kind of teach each other you know what you should send back what languages I support what features I support and so on and notice the most salient line is the second one up there that is literally the command sent by the browser to get the default page in this case is slash but usually if you don't specify a file name it's assumed to be that guy there index.html and it's that guy you'll be writing later in the semester so where do you where do you get these things like cnn.com in the first place do you mind switching me over so one of the voila voila welcome to understanding computers and the internet my defense okay why don't we do this idiot I'll come over here and you go over there that's that's what we call a work around alright so we do have the days news here what I'm going to go ahead and pull up though on this site is a disaster of a website because at every step they try to upsell you but one of the things you'll do to courses and for the final project is by your own domain name and a domain name is things like cnn.com harvard.edu and their standards these things follow but for just looks like 199 if you read the fine print can you buy your own domain name so we for instance bought a few months ago computerscience1.org and we actually pay a third party a company to host our website for us even though we wrote it ourself we wrote the html stuff ourself we're paying someone else to manage the server and we essentially configured our domain name computerscience1.org by way of all this fancy dns stuff that we talked about to point to the IP address of that person we're paying to host our server we're using his IP address as opposed to something that belongs to us now GoDaddy is probably the site we'll point you toward just because they're so popular but again they want to have this fixation on using pretty girls to sell domain names as down here and such and there's even more and they also try every time you try to check out they ask you do you want to also buy this would you also like 100 email accounts with this and so constantly you're pestered but if you forge ahead you'll finally get to $10 for a domain name but just to show you what kinds of things are possible we can look here at this drop down so back in the day the domain names that you have access to as a consumer were .com and what else .net and .org so despite what you might think to be true anyone can buy a .com .org or .net or in fact any of these others so in fact one that's kind of neat is .tv what is this referring to do you think oh yes so it turns out this is not referring to television although it has that nice resonance in English speaking countries turns out the small Pacific Island nation of Tuvalu years ago decided to turn a profit by selling off domains within their countries domain so in addition to .com and .net .org .gov .edu the authorities the volunteers that run the internet years ago decided that every country was going to get a two character country code so ours in the USA is .us there's .jp for Japan .uk for the UK now the US kind of planted a stake in the ground early on thanks to our military literally and so forth whereby we tend not to use our country code .us all that much but most other countries do well Tuvalu decided that they could turn a profit by letting anyone off the street off the internet buy in their domain so even we have computer science one dot TV just because it's for our podcast and the videos and it sounds kind of cool if nothing else we pay more for it I think it's nineteen ninety five instead of six ninety five we think it's worth it oh you're hovering so all right well there's a whole bunch of other ones here like you can buy your own domain name in Asia .webs.ws and a whole bunch of others but again what you'll do at course's end is we as a course will pay for a third party web hosting company will create accounts for all of you for several months so not just for the final project but for several months afterwards that you can live on the internet beyond the course and gradually transition yourself if you're so inclined to your own host and we'll have you buy a domain name and then construct your own website for your final project using that domain name and you can have your own email addresses your own databases if you want to get fancy but in general we'll teach you to be independent even of sort of a course framework because we'll actually use a real what live web host along these lines so remember this sort of separation and one of the things that confused me for a long time early on before I was really familiar with a lot of this stuff was that was the difference between a domain and a host and it didn't really make a lot of sense to me so just remember that a domain is just the name so in this case godaddy.com or cnn.com where you go and you purchase essentially the rights to that name so essentially you're just purchasing that those those collection of characters within some specific top level domain and the top level domain is the .com the .tv the .net that we were talking about before a host is actually a server or a computer that exists somewhere on the computer that hosts your website it actually contains the files that make up your website or whatever you want to host on the internet then through the power of DNS do you point your domain name to the IP address of this host so what that when someone visits your domain so cnn.com or godaddy.com from their web browser the DNS server looks up the IP address of that host and then is referred to that host through that IP address so you can think of the DNS server just very much like if we can continue on with our aging analogy it's sort of like a white pages or a yellow pages where you want to know the address of Google for example so you open up this big yellow pages that is the DNS system and you are given the address of where Google resides for example and then you are able to send this envelope containing the information or in this case this get query that we want to this HTTP get query that where we're saying that we want to receive the web page to that address and then they ship back to you the data that you are requesting what are you doing to my computer just playing you want to keep going you have used domain names in many contexts so not only visiting web pages but also in what other context as well yeah so email so just to toss out a couple basics because there's always something new here may I browse the internet over there for you so we have domain names that typically follow a canonical form so we have let me make this a little bigger so we have generally domain names are of the form domain dot TLD and what do we actually mean by TLD do you think yeah this is top level domain so the notion of a domain is divided into parts and the thing all the way on the right is generally something like dot com and dot gov and dot net and those kinds of things or country codes like dot us or dot jp and so forth the domain is something like harvard in harvard dot ed or CNN in CNN dot com but generally folks if they want to describe the domain name they typically include the first word and the TLD and you just describe it as one big thing though some some in some context there's these things called sub domains where you could actually see addresses like this in fact who do you know that has an email address that actually includes a sub domain so it's something at something dot something dot TLD what's that who who do you know that has an address like that so they extend oh myself right so mail in at post dot harvard dot you do by contrast Dan Dan Allen at MIT dot edu doesn't have such a subdomain so why do these things exist do you think subdomains why what's that cheaper but harvard owns harvard dot edu so it's not paying any marginal cost to have post dot harvard dot edu law dot harvard dot edu med dot harvard dot edu yeah pretty much mental organization logistical organization so that different cis admins can sort of own different subdomains so post dot harvard dot edu is managed by the alumni association fas dot harvard dot edu is by the faculty of arts and sciences and so forth just helps you subdivide things and so the depiction here Dan is drawing up is to give you a sense of sort of it's like a menu of domain options we can sort of piece them all together so we have post dot harvard dot edu we could have cnn dot com cnn could if it's so desired have various subdomains but it's really entirely within their own control but what about some so another thing that I I designed it like this to point out is that sure the one that we typically know is harvard dot edu but that doesn't mean we can actually have a harvard within a different top level domain so one of the most famous examples I think is for a while there was white house dot com versus white house dot gov for example and I think white house dot com someone's been on the internet I think white house dot com for a little while actually was inflammatory towards the occupants of the white house and so I think after a while they eventually bought the rights to that you know what it was white house dot com inflammatory is an understatement it was an adult website that someone had bought before the white house realized maybe this would be bad because frankly you know you don't necessarily think in terms of dot gov so there are probably a lot of people on the internet typing white house dot com trying to pull up George Bush's website and boy did they have a little surprise so a lot of companies will actually do this for example I'm not sure if cnn does it but many large companies will actually buy their domain within a variety of popular top level domains so that they are sure to point all of those domains to the correct servers so similarly we did talk about how many of the top level domains are actually open but there are a few restricted ones dot edu for example you can't just go online and call yourself an educator and have a dot edu top level domain there is actually this sort of authorization process similarly dot gov dot mill and there are a few others that you just cannot buy a domain for but most of these other popular ones dot org for example you don't have to be an organization you can buy it if you want dot net similarly dot com you don't have to be some commercial company in order to buy these top level domains so what about these email addresses so suppose I am sort of a neo fight and I type in something like Dan Allen Dan Allen at MIT dot edu versus Dan Allen at MIT dot edu should they both work should one work neither so it turns out both work and the reason being the DNS system domain name system is case insensitive doesn't matter whether you type in all caps or all lower case this is generally useful because it just wards off nuisances like people accidentally getting the caps wrong it being CNN just to satisfy their marketing people can advertise themselves as that just to make the CNN more pronounced I would say that generally you shouldn't assume that user names are case insensitive but for the most part I've never seen one that is probably Dan would get mail even if you sort of obnoxiously wrote it in all caps but this actually speaks to a whole other issue all together so suppose I wrote Dan whatever address and then I said something like hi Dan I'm in your E1 class and have a question so I'm sort of reading this gently but what is this really equivalent to saying to Dan it's shouting so this is a bad form of what the world has decided would be called netiquette which is just a way of describing sort of internet conventions and I don't know why and I'm supposed to stop saying these things on camera but for years my dad would write me like this dear David how are you so quite literally shout or at least figuratively shouting so we sort of whittled away that other sort of violations of netiquette these days would you say I still find it a little strange when professional people write things like how how are you I mean I've even gotten these from faculty sometimes like where are we so that's either good or bad netiquette that sort of internet speak these days I think it's kind of sad I mean I remember reading articles about secondary school education where there was even one argument made by someone recently that we just should just start to allow such things like this because it makes clear the point and who really cares if we're not formalizing things teaching students to write like this there's some interesting sociological issues there but not a fan there are some other ones though this has become more popularized and isn't necessarily so offensive as it is cool these days but when you say something like I win and you have copious amounts of exclamation points and now people do beyond these exclamation points is that they tend to forget the shift so you'll sometimes see this and so what people have done is taken it to the next level and they'll say this and instead of actually typing one so they'll they'll do that they'll actually type out one or eleven various things like that just to really emphasize the point that they are winning however you'll see some things like a FTW which is for the win oh man there's a lot of them we could go on for days about all of these acronyms but I'm distancing myself because I don't do this and I don't know how to do this and these are all various forms I mean hopefully all of you have seen these and most recently AT&T has popularized this NBD no big deal with their commercials so sometimes you'll hear not only in online but also just regular people talking to each other actually say these acronyms because it's actually faster for or maybe not at all but in any case they do actually they sort of bridge this gap between the interwebs and real life so another one is BRB be right back and what a lot of people do or what I've what I have like to do recently is to be even more obnoxious is to actually type out the letters so BRB for example and to really just prove that this really is not saving anybody any time because it takes some time to actually recall what some of these acronyms stand for it's a great lecture so let's pause so we have we can talk for hours about internet technology and we can scare the hell out of you with acronyms but it's perhaps more fun if we take a bit of direction so we've sort of planted the seeds of some low level networking on which we'll build next week when we sort of trace packets through the internet and actually look at data flowing across wires we started talking tonight about sort of applications running on top of the internet email being one service or application that effectively runs on top of the internet the web is another one actually as Dan alluded to in the first lecture for years this course had this sort of easy softball question on its exams which is are the internet and the worldwide web the same thing so now never mind Dan's conflation of inter and web in this sort of all in one catch phrase are the internet and the worldwide web the same thing technically so no so there is a difference and what how would you just how would you describe the internet even based on tonight's conversation thus far so it's a large network and it kind of refers to something physical the network of networks as you offer before whereas the worldwide web it's one of the networks but it's kind of conceptually higher level I would say like what is the web it's a bunch of web pages that are on servers that themselves are part of the internet and so the web and email an instant messaging and any of these kinds of soft these types of software that you pull up are really like services or programs running on top of the internet so whereas I would argue the internet is something physical it's an infrastructure the web is really just a service on top of it again like email and the web and instant messaging and so forth so what are some other services that exist on the internet email and the web or the collection of web pages that we typically know are very very popular but there are some other ones as well there's many other ones yes Skype okay yeah sure so Skype another instant messaging programs that's another service that exists on top of the internet anything else yes aim right so that falls under that same sort of category with instant messengers or Skype more formally I suppose is VoIP or voice over IP I think I heard another suggestion yes FTP very good yes so FTP which stands for the file transfer protocol is a little bit different from HTTP in that you're not sending web pages back and forth but rather this is used more for files because it's well I mean it stands for file transfer protocol any other ones we'll actually use that incidentally later in the course when you want to upload files to your websites that you're making whether it's images or photographs or sounds or whatnot you use a program like that to transfer files from your computer to our server though typically FTP in itself is not very secure in what you'll see more often recently is SFTP which might stand for what secure it right it's the secure form so typically you would want to use SFTP over FTP and it certainly FTP is going sort of by the wayside now because it's not secure any other popular things yes so GPS technology that doesn't so GPS works typically by transmissions from satellites and you may be referring to maybe to obtaining the maps over the internet and that could either be done either with SFTP or most likely HTTP if it was just downloading the images for the maps but GPS works a little bit differently it doesn't typically all work over the internet like that let's actually push it hard on one of the services email because most of you if how many of you use some form of web mail so a web page based mail like Gmail Hotmail okay and how many of you use like Outlook or Eudora or something client side okay so another sort of half of you so anytime you're configuring a local program like Outlook Eudora any of these mail clients you usually have to fill in some blanks when configuring your account so it's not as simple as just saying my address is mailinitpost.harvard.edu you go fill out the details though that would be a nice thing indeed you instead have to provide some details and can you recall any of the acronyms that refer to some of those fields you have to fill in yeah so imap in all caps imap is one of them another popular one sorry pop so these yet more acronyms refer to protocols that mail clients and mail servers use to talk to one another so whereas HTTP was for bred browsers and web servers imap and pop and there's one other yeah SMTP so SMTP is another protocol related to mail so if you're configuring a local program to access your email so in addition to providing your email address just so it knows what to stamp on the from field and your username and password so you can log in you have to tell your program to whom you want to connect on the internet and these are addresses that typically your ISP or your company will provide to you so for instance Harvard gives you access if you so choose to email accounts of the form something at fas.harvard.edu Harvard's mail servers are called respectively imap.fas.harvard.edu pop.fas.harvard.edu and smtp.fas.harvard.edu also common is just to generalize it as mail.harvard.edu or whatever your domain happens to be but there's some interesting features one of these is easy as Dan's noted here smtp is the server that exists for sending your mail from your computer out on the internet so it's outgoing mail only pop and imap are sort of alternatives to one another so pop pop is a protocol kind of dated now we're decreasing utility these days but still exist and it is its main feature is that when you connect to the server to download your mail it literally downloads your mail to your local computer and either deletes it from the server or leaves it there but the implication of leaving it there is now you have this potential inconsistency so my mom for instance was frustrated for years because she would usually check her mail at home using outlook from our local ISP our cable modem provider's email account but occasionally she would travel and she would use her ISP's web mail so to speak web page email same account but a different interface that doesn't require a local software and the problem was that if she had used pop to download her mail at home she couldn't access any of that same mail or if she left it on the server by checking another box then she had this inconsistency annoying perhaps whereby if she deleted something on the server she'd get home and it's still in her inbox as though I didn't reply to it so you get this sort of state of confusion perhaps imap is nice because by contrast take a guess as to what it does synchronizes which is really compelling which means anytime you download your mail from the server if you then delete that message on your computer it disappears from the server and if you log in via web mail for instance and check your mail elsewhere at a friend's house or while you're on the road and delete a mail or send a mail it ends up gone on your home computer or even sometimes in your home computer sent mail box even though you didn't send it from home so for the most part it's sort of a course takeaway if you have the choice or if you have to choose from among these and you have the choice with imap because it's just better now the gotcha is if you're getting lots of email attachments as most of us do these days a lot of spam problem is with a lot of ISPs you'll end up running out of space right so unless you have something nice like gmail which gives you like a gigabyte or more of space these days a lot of ISPs are still pretty stingy they'll give you 50 megabytes maybe a hundred but that gets eaten up pretty quickly so there are tradeoffs and so one of the takeaways here and even presenting all these acronyms isn't just to throw minutiae at you but to sort of how you appreciate the tradeoffs and sort of the design decisions that went into these so you're using one appropriate for you yeah certainly so just to just to remember that pop really is best if you're only going to check your mail from one computer or one device and not really from any others just because it downloads it off of the server and most of the time it deletes it off the server so thereby freeing up space to receive more mail on the server but then also you get like David said this weird inconsistency between the server and the client side whereas IMAP keeps everything on the server so you could fill up the space that you have more quickly but you then will be able to look at it through multiple computers but David said just a little while ago that you could get either one of these services from let's say mail dot fast dot harvard dot edu or just as an example and so this actually brings up the interesting notion that you can have one computer so there's one server somewhere that has one specific IP address that actually can provide more than just one service so you can think of a computer being dedicated to just email but it may not be set up that way it could actually be that this server w dot x dot y dot z not only offers the IMAP service it also offers the pop service but it also could be an HTTP server so that you can actually connect to it and you may not even know because you may not know the IP addresses of all of these servers of course all of these things could have different domain names but through the power of DNS they could all just point to the same IP address so just because you have one computer doesn't mean that you have to devote one service to it there are frequently servers that run multiple services on them so just to make that a little bit more clear and again if we can invite you to put on the so called engineering hat if every computer on the internet has a unique IP address and you send a request to that computer but sometimes the request is for a web page, sometimes the request is to send mail, sometimes the request is to get your mail, I mean how does the server do you propose distinguish between those different types of requests like what could it do so it turns out that on that so called virtual envelope in which the requests are going like get the home page or send the mail isn't just on the outside the IP address of each of these servers but also unique number so all these big acronyms we've been alluding to also for computers convenience have unique numbers called port numbers associated with them so the port number associated with HTTP is anyone know 80 so the world decided some years ago that the number 80 is synonymous with HTTP one is computer friendly one is human friendly the one for SMTP random E1 trivia 20 25, 21 is FTP so you really are a geek if you can rattle off esoteric numbers like these other ones I map harder 143 I win pop I'll concede I don't know either Dan? I don't know so it turns out then that we can prove as much is what I'm going to do it's a little small instead of going to say CNN dot com or rather google dot com I'll zoom in for you zoom instead of going to HTTP colon slash slash google dot com I'm going to be even more explicit so it turns out that with URLs uniform resource locators which those things are anything that's of the form protocol language colon slash slash address name you can also end it with a colon and then explicitly specify what port number you want and then the name of the file or just the default forward slash so if all goes well when I hit enter here I'm going to see how do I zoom out making me look like a noob okay you can see that we have in fact reached Google now they rewrote it for us so you can you can include in your servers configuration little tricks that sort of quickly change that feature just if you want to standardize the appearance of your websites URL but it clearly work because it did in fact pull up google but by contrast if I do something like HTTP colon slash slash google dot com slash 25 which is a little strange because now I'm saying use HTTP contact google dot com but talk to it on the mail server port well then yeah there's there's something wrong there port is restricted because Google just hasn't allowed us to do that so at the same time though you can actually have these services operate on different ports so what you will frequently see maybe instead of port 80 is if you ever look at an IP address it may be some address and then colon 8080 so sometimes web servers won't operate off of port 80 but off of port 8080 and this is for a variety of reasons but just realize that it is possible that you can actually map services onto different ports so if we really wanted to confuse people we could operate HTTP off of the SMTP port and force them to go to our website with a colon 25 but that's just not good practice in general you generally wouldn't do it you would want to stick to the standard ports because then you're sure that things will just sort of fall into place and will work but this is how through the use of these ports you can actually get multiple services off of one server you can actually have a number of ports it goes from let's see port one all the way up to 65535 I guess it's actually port zero we'll start at port zero so there are a number of ports to choose from and everything all of these services that we've been talking about FTP SFTP, AIMS, Skype they all operate off of various ports some of them are fixed like HTTP where you would always connect to a server on port 80 for HTTP but some of them are a little bit more dynamic instant messaging for example usually picks a port in some range at the higher end of this and various other things so we have a range of experience in the class but this is always a gem that sort of enlightens at least one person so we thought we'd do a little e1 email trivia here if I'm pretending to write an email though I'm clearly writing this in like a word document so you really have to have no clue as to what you're doing if you're doing this for real but from it's going to go to mail in it post it's going to go to Dan Allen at edu but you know what I'm going to say bccjdcleg at fas.harvard.edu and actually let's say cc there at sces.harvard.edu so all four members of the staff and we're going to say man that John dot dot dot so what's the deal with this email who's going to get what so it's from me who's it going to go to okay so everybody Dan receives this email what is he going to know that it also went to there because Chris's email is in the cc line carbon copy is he going to know it went to John so no so here lies the functionality of blind carbon copy bcc so it gets to the person but none of the other recipients know who was in that field because this is what the email looks like when I send it anyone receives it it actually looks like this it goes to the person but that header so to speak is stripped off so sort of maybe old hat for most of you but there's certainly some people out there who don't quite know the difference and there's also people out there who don't know the difference between reply to all if I could just quickly say alright so even John when he receives that email even though he was bcc'd he doesn't it doesn't show that he was bcc'd he only receives the email and he sees the same thing that everybody else does who it was from, who it was to, who was carbon copied but unless he can figure out that he was bcc'd he may not necessarily know that so another sort of I mean even I've done this if you're not sort of paying attention to whether or not your email appears in the to where the cc field and you decide to hit you know reply and then you start typing or maybe even worse reply to all turns out that the people who didn't know you were bcc'd all of a sudden realize that you know David's been bcc'd seeing people and I look bad and they you sort of disclose that you've even gotten the correspondence so sort of bad things can happen some email clients maybe gmail some are good oh my blackberries actually maybe not gmail my blackberry has this wonderful feature where it actually tells you explicitly you were bcc'd because this isn't a hard thing for a computer to check if you've gotten an email in your inbox the name is nowhere to be found in the headers the blackberry takes it upon itself at least mine to say you were bcc'd so I don't do something stupid like reply to all thereby revealing the sort of secret nature of the bcc unfortunately almost every other mail client I've ever seen doesn't do that so it's sort of burden is on the user because this legit oh they're a little trivia so let's say in this to field here a username of Armin Dar is at MIT.edu yes no so why is it illegitimate so no spaces so pretty much in email addresses the only valid characters are a through z a through z a through z 0 through 9 what other characters so underscore hyphen dot itself plus tends to work what's that anything else so there's the at sign but that's sort of beyond the user name so really this is just the stuff to the left so yeah that kind of rounds them up anything else okay good good well let's open the floor here to question so again we have sort of an infrastructure on which the we have some infrastructure that describes the internet and say local networks we've now sort of started talking about services quite familiar to you web email and so forth what are your questions before we go into that I actually do want to make one mention about email so we were talking about how everything on the internet communicates to another server or to another computer how by knowing what so if I need to send something to another computer how does it know where to send it right through this IP address and the sort of same thing exists in email so I'm just going to mute the image so I don't bring up any improper emails but I just want to show you a certain trick with emails here let's see where is a good one that we should be going to that website I know this one I really should delete that email where was the I'm glad we postponed Q&A for this well yeah I know I should have done this earlier but let's see okay so this email for example alright so this is an email from John about section and he sent this you can see on October 6 then he emailed what did he do here he emailed who? he emailed the staff but he also BCC'd all of you so this is sort of a way to protect everyone's email address from each other but what's tricky is that there's a lot more information in this email than what is shown and so we're called the raw headers and these are all of the it's all of the text that's associated with this email that sort of sent behind the scenes very similar to these HTTP headers that we saw earlier with the the get slash HTTP etc and so if we take a look we can see all of the various bounces or all of the various locations that this email has been passed through so all of the various servers that have received this email so if we are to scroll down we can see the very first recipient of this email so it was sent from this particular IP address which is we if you remember the 10 dot something is a private IP address but what many people don't realize is that if you send an email from your client let's say you're using Outlook for example and you send it from your home computer to a bunch of people and you say oh I can't do that job today I'm sitting on the beach in the Bahamas or something like that they can look up your IP address and try to figure out just based on the IP address where you are actually located so if this was 24 dot something dot something dot something then I would know that something is up because you're not really just sending it from the beach on the Bahamas. Wait wouldn't you tell people that you're sending it from home when you're in the Bahamas? Well no I mean you could say that you're on vacation when you're I mean you could do any number of things right so anyway I got confused on the road and I suppose more typically you were in the Bahamas and you wanted to say you were sick or something rather than calling rather than actually claiming vacation days and they could see that your IP address actually is not in the United States there are actually websites that allow you to map IP addresses to a geographic location as long as it's not a private IP address if it was 192 dot 168 we wouldn't be able to know because thousands of routers actually have that IP address but if that address was translated to a public address then you would be able to figure it out pretty easily however there are always ways around this you could actually log into a remote computer and send an email from there a variety of other things if you were to send it via webmail for example it's usually sent from the server that you're connected to and so that would be you know google.com or something like that so you wouldn't get a lot of information but this sort of thing does exist. Oh and it's very real I think I mentioned a couple of lectures ago my brief stint with the district attorney's office and you know among the things that happen in society are people send threatening emails well a lot of these same idiots send them from their own personal computers and just sign them with different names or put phony email addresses but stamped inside those headers as Dan has pointed out are the IP address of that person's computer and it's not all that hard then to subpoena yahoo or google or wherever and find out exactly who that person is or who asked more realistically Questions? Questions, yeah Correct That's right that is the only way you would know that you were BCC'd is that you have to look for your email address in the two field and the CC field if it is not there you can assume you were BCC'd or you can maybe there's some sort of weird forwarding going on where you can have an email address that forwards to another email address and so that's sort of another reason but typically what can happen is that especially with these long email forwards people just pile in, you know, addresses in the two field and the CC field and the BCC field and you'd have to go through every one just to make sure you're not there. Neticit, if you want to email a lot of people that's cool but put them in the BCC field and not as for instance my high school has done and other big entities emailing like the entire alumni list in the two field it's kind of a new big mistake. Other questions yeah. It'll be in two weeks actually so sorry about that, yep next week's the holiday so when we said next week we mean two weeks hence for the internet, yeah. Where would a DNS server typically reside? So that's a tricky question I was hoping that we would be able to push that off until next week but there are many many DNS servers that exist so for example your ISP could have one and typically the DNS server has to be known in order for you to be able to contact it so just like everything else you have to have an IP address of that DNS server in order to be able to make requests to and from it so just to let's see nail this idea home I just want to see if I can look up the settings that I received from Harvard router okay so here just just quickly you can see that I'm using wireless right now and I'll get to your question in just a minute I just want to go off onto a quick tangent because I wanted to show this before so you can see that I'm connected wirelessly to an access point one of the many that you see around near the ceilings over here and using DHCP I was given this address 140.247.228.31 the subnet mask will we may go into that next week and the router which is the next hop basically the first computer that we contact when we want to go to the outside world now if I click on DNS we can see that also through DHCP Harvard gave me these DNS servers and so what do you notice about these top two 140.247. what what can we infer just from the IP address about these DNS servers right they're Harvard owned they're somewhere on campus perhaps or maybe not even on campus but they are at least Harvard owned and operated and so this you can imagine that there are more DNS servers than this but Harvard operates their own DNS server and that DNS server is known to our computer when it is passed through DHCP essentially and so there's even another DNS server 128.103 well I don't know where that is but why might they have put this third DNS server right so just in case for whatever reason we cannot contact Harvard's DNS servers maybe they've gone down or for whatever reason we can then try a third party DNS server that we can still access websites and the like so there is this sort of hierarchy of DNS servers where you can imagine that if we have thousands of these DNS servers we're going to have this weird synchronization problem where how do we know that one DNS server has the same records as all of the rest well there's this sort of hierarchy where if Harvard doesn't know the answer to some DNS query it goes to some higher level DNS servers which are themselves known by the Harvard DNS servers and it keeps going up this sort of chain to the top level domain servers where which is just the sort of root set of servers that will know a number of domains and can point your request in a particular direction but I think we'll go into this in a little bit more detail next week here let me zoom out for you see you don't can I set up a teaser for next time here set up a teaser all you'd like but I hope that answers your question one second before we get to your question I think David wants to show something go ahead yes dynamic and static IP addresses that technology still exists in fact a dynamic IP address they're both IP addresses by definition however a dynamic one is just provided to dynamically so by DHCP for example so you're given a dynamic IP address when your computer contacts a DHCP server you're then given or assigned an IP address from that DHCP server so the reason that it's dynamic is that every so often your computer has to re-ask for an IP address just make sure it can use the same one so that there's none of these conflicts that we were talking about before so that's why it can be dynamic it can actually change even while you are using it whereas static for example really isn't a good idea for laptops where it's more useful are for desktop computers for example where you know they're not going to move very much where you can just input the IP address static which means it doesn't change and you know that that IP address is going to change very infrequently or not at all so it's the difference and send the name here where when I showed you the network components here on my computer it said using DHCP that IP address was dynamic however I could set the IP address manually that would be a static IP address it's just how you tell the computer what its IP address is really that changes the definition between dynamic and static good question though anything else yes okay so that's a complicated question you lose your data so how's the best way to carry your laptop and so typically you can carry it however you would like so long as the laptop is off or asleep which basically means that the hard drive is no longer spinning inside of it because remember the hard drive is one of the last components in the computer that's actually physically moving if you remember when we were talking about the head is just millimeters or microns away from the platter so if you shake it a lot you can imagine that the head will actually sort of gouge and not exactly like an etch-a-sketch but very similar to you race data just because it's sort of destroying that magnetic bit of data that's on the top of the platter so what you want to do before you move your laptop is to actually put it asleep or to turn it off whichever you prefer it doesn't really matter honestly and then you can do whatever you like you can use it as a frisbee though of course too much damage then will also destroy components in it so but I don't think it really matters if you carry it in a bag with wheels or carry it in a backpack as long as it's off or asleep it really should be okay off meaning switched off right though sleep is a low power mode which does the same thing where it just turns off the hard drive so that it's no longer spinning but it when you take it when you wake it up from sleep the reason that it remembers where you were what you were doing or all of the windows that you had up was because it maintains the the RAM the state of RAM so the battery isn't powering the hard drive or the processor anymore it's just powering the RAM so that all of the bits that are inside of the RAM are still being remembered while it's asleep so that's why sleep is lower power than on but more power than off some components are still powered in sleep mode so Dan just committed the course to an extended six month warranty for you if you go home and start doing this to it well sleeper off so one of the neatest things I saw years ago when I was just learning about the internet and routers and how it worked was a few commands you can run for instance on a Mac in this case but even on windows computers or on Linux computers one that's very easy is this command ns lookup and this will be our teaser for two weeks hence if I'm curious to see what the IP address is of cnn.com you'll notice this somewhat arcane command name server lookup just tells me it spits back those IP addresses which normally I the human shouldn't care about but I the sort of computer scientists take a bit of an interest in but more interesting than that is this little trick which exists on most operating systems called traceroute because I hypothesized alright if I'm sending an email from here to say my mom out in Washington DC well let me go ahead and do traceroute whitehouse.gov and see what exactly is between oh they're filtering yeah let's go assume that my mom is a professor at stanford.edu and I want to send her an email from east coast to west coast and so I hit enter and they too are filtering Harvard filters so there is there is actually in this sort of enhanced version of traceroute which sort of not only shows you what is happening in terms of the IP addresses but also maps those IP addresses on a map and let's see if I can find one and it's called graphical traceroute and what you can do is you enter an IP address let's see I hope this one works what you can do is you enter an IP address and it will try to show you exactly where those packets are going but on a map it will show you where all of those little bits of data are going when you do that request and so let's see I think this one is not going to be so good so let's do an NS lookup on cnn.com so we can find out one of their IP addresses and so you can see here one of their IP addresses is 157.166.224.26 I'll copy that paste it over here in this graphical traceroute and click start and let's see if this will work here okay so the map is pretty terrible but you can see that it started in about there is a much better one no one is going to want to come back after this demo it's embarrassing I know it's a terrible it's a terrible traceroute we'll see you in two weeks we'll have a real traceroute for next time but thanks, see you soon