 All right. Welcome back to Computer Science E1. This is lecture five, The Internet Continued. So this is our second of two internet lectures. Next week, we move on to multimedia. And then shortly thereafter, website development and more, but going a little break between then. Next Monday recall is the course's first exam. The exam will be partly multiple choice in nature, partly true false, partly short answer, and largely oriented around conceptual questions and your understanding of the material and covering thus far. This coming Thursday section with John and Chris will be an exam review. Also next Monday during section, even though it's right before the exam itself, it'll be an opportunity for last minute Q&A. Come in, get some last minute questions answered. And what you'll find, and we'll distribute this via the course's website via link, you'll find that we'll make available a previous year's exam, so that you have a sense of the format, of the types of questions, and kind of the spirit of the exam. It's meant to be fairly, well, it's meant to be fun. And it is, again, meant to be sort of to assess your conceptual understanding. But the overarching goal, though, is to incentivize you to spend some time making sure the material is sinking in, even if you do have to cram a bit, because there is, again, this fire hydrant that we've been throwing at you. And so these exams, more than just try to squeeze numbers out of you, scores out of you, is really meant to kind of push as much of that information into you for long-term sake. So more on that in section this coming week. Any questions about that or beyond? So I thought I would, and Dan has allowed me to redeem myself, if you recall, a couple weeks ago, we left off with a completely failed traceroute example, which Dan then tried to save me with another failed traceroute example. So what I did this time was I connected to a machine outside of Harvard, because the problem last time, which we didn't anticipate, was that Harvard actually packet filters some of its traffic, somewhat for security reasons, somewhat for performance reasons, unfortunately got in the way of doing a very interesting demo. And this demo is to show a program called traceroute, which is a program that exists generally on computers running Linux, a certain operating system, increasingly popular. But it also exists on Mac OS and Windows, if you know the right commands to type. And what I'm very simply going to do is type this command called traceroute. It's a very uninteresting program visually. It's just a command line program whereby you type a command, hit Enter, and that's it. There's no Windows. There's no icons. There's nothing fancy. And it's going to look a little scary at first, but a whole bunch of output just got dumped to the screen. Each of these rows that were outputted, even though a few of them wrap, represent hops, so to speak, on the internet. So by typing in traceroute space cnn.com, what I have asked this program to do is to trace the route between my laptop and cnn.com. And apparently, there's a whole bunch of steps between me and cnn.com. Apparently, some that the program's struggling to even figure out, again, for performance or security reasons. But let's take a look at the first couple, at least. So if I scroll back up, and again, the syntax looks a little cryptic, because they show you a little more information than we care about. I can also demo my new fancy toy here with this pen here. Notice that when I typed in traceroute cnn.com, the very first thing it showed me was this, parenthetically, next to cnn.com. And this, just as a quick refresher, is called a what? IP address. So every computer on the internet's got an IP address, including cnn.com. In fact, that website has a bunch of IP addresses for scalability reasons. They have many, many web servers. And what this is just telling me is that we're about to trace the route from me to that IP address. And take a guess, based on just what you might generally know about the internet, what each of these hops, what each of these rows between points A and B actually represent physically? Yeah, routers or servers. So specifically, a type of server called a router. And this is that kind of device that exists on the backbone of the internet, the really large infrastructure that tends to carry a lot of data from points A's to point B's that actually has data coming in. It checks the IP address of that packet, where it came from, and where it's destined to. It looks up in a really big table, sort of like an Excel spreadsheet with two columns that says if the IP address is 1.2.3.something, go this way. If the IP address is 2.4.5.6, go this way instead. And quite literally, these routers take in data and route them across in the appropriate direction. And by direction, I mean literally out of different cable connected to some other router on the internet or something a little more sophisticated. Like go this way and then head in a different direction on the internet. The routers get data from point A to B. So each of these rows in this output represents a router. The first of these is called something a little strange, dom zero dot something. The second one doesn't apparently have a name, it just has an IP address. The third, this is kind of interesting. It actually appears to belong to, yeah, it's actually amazon.com, the IP address. So does this one, so does this one. Then we see an IP address again, but now it gets a little more interesting. So first, let me explain this. And just to give you a teaser as to what other kinds of material there is in Vogue and computer science and in industry these days. This is actually an excerpt from a problem set from another course I teach at the college in which we're actually using not Harvard servers for the students to write their programs on. It's a programming course in a language called C. Instead of using Harvard servers, we're actually using amazon.com server. So long story short, Amazon has been at the forefront of a trend called cloud computing, which is this really sexy term that just describes the use of Amazon owning a lot of fancy hardware. Really with lots of RAM, lots of hard disk space, lots of CPU cycles that they then partition into smaller portions so that people like us can pay for just a slice of that machine, so to speak. And we get what's called not a physical server, but a virtual server, a virtual machine whereby we have the illusion of having our own server, or actually our case servers on Amazon's infrastructure. But it turns out that we and some other clients actually have our own servers living on their same physical hardware. We just don't know it because of various security protocols they have in place. It looks to us like we're the only ones there, but in fact we're sharing RAM and CPU and hard disk space. So the reason that you were actually seeing those several rows spit out addresses belonging to Amazon is because what I did is I technically lied. I didn't connect from my laptop per se to cnn.com. I first connected from my laptop to Amazon, to my courses infrastructure, so that we could kind of get around Harvard's own security restrictions. So what we're showing you is the route between my courses infrastructure, Amazon's infrastructure, and cnn.com. So similar idea, we're just getting around the nuisance from last week. So this is our depiction of a cloud, and suffice it to say that it's a neat trend in that it allows us to spawn more and more virtual servers literally by typing commands without physically having to buy or plug anything whatsoever in. But I said the story was about to get interesting because after we go from our address down through this Amazon router, this Amazon router, this Amazon router, finally things get a little less familiar. So it looks like at this point, our bits from my computer to cnn.com are going through where. Looks like Washington, probably Washington DC through a company called level3.net. So level3 is actually a really big ISP like MCI, AT&T that have really big routers and really a lot of infrastructure on the so called backbone of the internet. And it looks like Amazon is actually patched into level three. So Amazon's ISP appears to be this company called level three. Looks like level three has a couple routers. This one that I circled, one below it, one more down here, yet another one here. But oh, this is interesting. At this point in the story, line 11, where are my bits from my laptop? So they're all the way down in Atlanta. Now, this is just convenient that these people have named their routers with the cities in place, they don't need to do this. And clearly up above, they didn't do this. But it gives you really interesting insight into the path that my data is taking. Now these rows are pretty long and it turns out all of these things at the end, like 16.032ms milliseconds and 19 milliseconds and six milliseconds. So what this program TraceRoute does beyond just showing you where your data is going, router to router, it's also running some tests telling you how many milliseconds it's taking for packets to go from point A to point B. So the fascinating takeaway, frankly, I think still, even after all these years of exposure to this stuff, is that to get from my laptop, or technically Amazon servers, from, let's say, Cambridge, Massachusetts to Atlanta, takes how long? Like 15 milliseconds. That's pretty damn fast for bits to go from one place across city lines, state lines to another city altogether. Yeah, this is called TraceRoute, one word, T-R-A-C-E-R-O-U-T-E. And so now the last little demo I'd like to show before Dan one ups me with something more graphical is to actually go not to cnn.com, but just because I know it exists, cnn.co.jp, which as you might imagine is actually CNN's web server, probably in Japan. Let's see what kind of output we get now. So I'm going to hit Enter. Some of it's going to scroll by pretty fast, but what do you notice immediately perhaps somewhere on the screen here? What's interesting if overwhelming at first glance? What's interesting? Oh, taking longer. Where does it really start to take longer? Yeah, looks like line 10, right? So it's not quite clear to me from this output what cities the data starts in. Although, you know what? I'm going to conjecture that, though it's perhaps a coincidence, that maybe our data is maybe in the US, Massachusetts in the beginning. Conjecture, not certain that's true, but maybe just based on that host's name. Looks like USMA. I had anyone want to guess. I actually don't know what this might represent. It might not represent anything that we would understand. But what does seem pretty clear to me that between line 10 and 11, this is kind of a huge difference, right? And even between line 9 and 10. So I'm going to conjecture that there's something between lines 11 and 10, because where is line 11? It's apparently in Tokyo, Japan, if we're reading these host's names correctly. So what would you conjecture is between lines 10 and 11, or certainly lines 9 and 11? An ocean, like literally. So yeah, it kind of takes a bit of time for data to go farther distances. And certainly when you have an ocean, whether it's the Atlantic or the Pacific, or maybe some other kind of link, maybe satellites in some parts of the world going from location to location. So one of tonight's foci for material is going to be on this infrastructure of the internet and how data gets from point A to B on really a macroscopic scale. Whereas last two weeks ago, we focused more on the home networking aspects and the smaller pieces. But when you start to plug lands together and get wands, and then you plug wands together with wands, you get that thing called the internet in case and point, this is the kind of stuff that's actually driving it all. So this is all pretty interesting, isn't it? You get to see with at least some reasonable confidence where your packets or where your requests are going. But one of the things that this doesn't show you is where exactly on a map each of these things are. So we can conjecture like David said up there, line four is probably in Massachusetts. And line eight, well, I don't think we were able to figure out where that one is. But there is this sort of way of looking up where an IP address is located on a geographical map, basically just on a Google Maps, for example. You can look up where an IP address might be represented. This isn't some information that you can just go to some website and find very easily. Usually they are provided as services by some companies where you can purchase, for example, a list of IP addresses that match to specific geographical locations. And also they aren't terribly precise. You can only know within a reasonable certainty that it's, let's say, within this city or within another city. But it does give you some additional information that we just don't get from this text-based interface. So if you can switch me over. So this is a graphical traceroute tool or a visual traceroute tool depending on how you want to do it. And unfortunately, because we still have this problem where we can't perform a traceroute from Harvard servers, I have to do it from their servers. And they are actually located in LA. So let's just pretend that we are whisked away to LA and all of this will sort of look the same. And here what I've done is just, while David was blabbing about stuff, did the same sort of traceroute to CNN.com. And since, remember, this is from their server in LA, we get to follow the hops that these traces have gone. And so if we zoom in more and more, we can see that it actually started way down here in palms, it looks like, before moving on to LA and finally going all the way across the US, the continental US, to North Carolina, and then finally to what looks like Atlanta in Georgia. But what's great about this tool is that if we type in some website, so let's just say Google.com and do a host trace, might take just a few seconds for it to go. But now you can see the hops as they occur. Excuse me. So first we started at LA. It looked like it jumped all the way to New York and then it went all the way back down to, or no, it went up to San Francisco area. So what happened here? Why did it go all the way across the United States before finally going back? And remember this has nothing to do with the fact that we are up in Massachusetts because to this server it doesn't care. It's performing this traceroute as though it were starting in LA. We are telling a server remotely, the server that's in LA to start this traceroute. So why might it have done this sort of ridiculous hop, 3,000 miles in the wrong direction? Yes. Right, yeah, so well, yeah, you could, let's see, the DNS server. It's probably not, it's probably not the DNS server that tells it to do the hop, but it's most likely these routers that we've been talking about where the best route for whatever reason is deemed to be going this direction for the IP address that we are given from Google. So I suppose it does make a little bit use of DNS in that we have to first look up the IP address of Google.com, or rather we don't do it, but this traceroute does it for us. And you can see that a number of IP addresses are provided to us. We don't know which one it selected, but one of those was routed all the way from LA to New York for whatever reason. I mean, it could be a number of things. It could be that particular IP range actually exists in New York, and then they figured out well in New York more specifically, this more specific IP address is actually in San Francisco, or it could be some sort of a routing problem between LA and San Francisco. It's really tough to tell, but the point is that it went across these internet backbones that David was talking about, these huge thick chunks of wire, most likely fiber optic cables, and it went all the way from a very large cluster of servers in LA all the way to New York and back. Where this gets more interesting though is when we do something overseas. So for example, like the example that David just gave, cnn.co.jp, which of course is CNN's Japanese version. So let's see, I clicked on host trace and it usually just takes a couple seconds for it to get going. We can see here it's been 15 seconds so far that it's been doing this. So it may actually be collecting data before it actually shows us visually what's happening. So it goes to LA before making a jump all the way to somewhere in China, and then finally to Japan. So this is probably just slightly off a little bit. I'm guessing that the packets don't go all the way to middle mainland China before going back to Japan. Most likely it's going to one of the larger cities on the East Coast, Beijing, Roshan High or one of those, which most likely has a very large cable, literally a large cable under water from LA all the way to that one specific city in China before going through another underground cable to Japan. So you may remember from a number, or you may not remember, but a number of months ago, people were concerned because a lot of fiber optic cables in the Middle East were actually being disconnected or being cut literally. Do you remember any of this happening? There was, let's see, I forget the exact specifics, but it was, which country was it that was being isolated by these cuts? You ever remember? No. Was it Iraq? I don't. Oh, I don't think it was Iraq. I'll have to look it up. But anyway, this really shows that we do have a finite number of cables that link us between these large land masses. And if they are cut, we are going to lose connectivity from us to the websites that exist on the other side of the world, literally. So how does this actually happen though? We've been talking at somewhat a higher level of how all of these things operate and work, but how exactly do we get this data and send it across this wire to some other server somewhere else in the world? So Dan showed us these lines. So the data is flowing across some connections here, and that actually begs an interesting question when it comes to this routing of data. So is the shortest path between two points on the internet a straight line? So what might that even mean? So in this case, there's some kind of, assuming the data is correct, it looks like we're kind of going west and then hooking a left or hooking a right and going back to the east, which seems at first glance to be a bit wasteful because we're covering many, many miles unnecessarily. Is that necessarily a bad thing though, would you say? Yeah. Yeah, absolutely. So among the decisions routers need to make is not necessarily just what direction should this traffic go in but what other directions could it go in? So one of the things that routers are supposed to handle also is what's called congestion control. And so if a packet of yours reaches a router that's really backed up, the tubes are clogged and Senator Ted Stevens speak, well, what can that router do with your packet? Well, if you are a router, if you're a computer and you're getting way too much data coming in, you're getting overwhelmed by all these packets. What's perhaps the simplest, if most naive and obnoxious thing you could do with that packet? Re-router. Oh, so I can come up with something more obnoxious. Even better. Drop it. Drop it. Just ignore it. Pretend you never even got it. And in fact, that is the behavior of many routers. If they're queues, if they're buffers, like the space they reserve for incoming packet get overwhelmed, well, arguably the most effective thing they can do to sort of keep up with the rest of the data is just ignore the data that they simply can't keep up with. And so that kind of suggests that a lot of your internet traffic, your emails from you to mom or your webpage request to you from you to cnn.com might actually never reach their destination if there are certain hotspots on the internet where your data, just because of various routing decisions that are not within your control, you hit these bottlenecks, maybe your data's not actually going through. Now, this probably happens to you, right? We've probably all sent an email or claimed to have sent an email, and it actually didn't get there. And let's hypothesize for a moment before we tease apart some of the more technicalities here. Like, what might have happened if you, for instance, send an email to someone and they claim that it just didn't get there? Hypothesize, what might explain that? Since this is perhaps a very common sort of question or predicament, and perhaps if you have IT folks who just kind of like to give you the run around, you maybe don't necessarily get what's a compelling answer or a true one. So what could it be? E- What's that? So bottlenecks on the server. So bottlenecks on the server and be more precise, which server? Where? Okay, good. So just to elaborate, so there's many different hops between you and the other person's email server. And we talked about this two weeks ago, what's the name or the acronym that describes an outgoing mail server? The thing that you, your client sends mail to, and then it takes care of getting it the rest of the way. What kind of server was that? Yeah, SMTP, simple mail transfer protocol. So not a big deal to remember what that expands to, but SMTP is an outgoing mail server. And so if you're using Outlook or Eudora or Thunderbird or other client-side programs, that is not Yahoo Mail, not Google Mail, which are web-based programs, well you probably at some point or someone for you configured your client to connect to that mail server, and then that mail server's purpose in life. And SMTP servers in life is to take mail from you at being an outgoing server, and then get it along its way and send it to other SMTP servers that are one step closer, so to speak, to the recipient. So hypothesize what might go wrong? Well, let's be really simplistic. Your own internet connection goes down, Comcast goes down, corporate internet goes down, and you just don't realize that the email's still sitting where? On your computer, in your Outbox, right? So that's sort of, maybe the onus is on you to make sure it's not your fault or at least your own network's fault. So now let's suppose you fix that. You go into your Outbox, you realize, wow, email's still there. Let me double click it, open it up, and hit send again, because now my internet connection's back up. Okay, now it goes out. All right, so where else could it get stuck? Well, maybe that first SMTP server. Maybe the server was itself responding to incoming requests, but for whatever reason it was misconfigured, it was malfunctioning, and so the email is now stuck in the mail server. And in fact, this certainly happens. Multiple times over the years have emails that I've sent or received or supposed to have received from people gotten delivered days later because one of Harvard's mail servers was acting up and the sysadmins didn't realize that there was literally a folder full of mail backing up, backing up, backing up that never actually got delivered to people's desktop. So there too is an option. But now this is very high level stuff, right? And our theme two weeks ago was not only to discuss some of the hardware stuff, but these services of the internet, mail and web and these kinds of things. But now today we're clearly going a little deeper into the story, right? On the backbone of the internet. So beyond the specific mail server and your client, where else might there now be a problem based on these kinds of discussions? Yeah? So a router is congested or a router blows up or there's a routing problem like someone screws up somewhere between you and point B and so data hits some router and gets dropped. Maybe it gets stuck somewhere in some kind of queue. Maybe there's a circular routing problem where your data is literally kind of going around in circles. But fortunately, the internet actually has built-in mechanisms to prevent that. In fact, what was the highest number we saw on that traceroute output? Yeah, it was like 30. So in fact, built into TCP IP, if you recall from two weeks ago, the language that computers speak on the internet. There's generally this thing called the TTL, Time to Live. And this is like a ticking time bomb that's built into the design of any of the data you send across the internet. And this TTL's purpose in life is just to be a number, initialized to like the number 30. And every time your packet, your email, your instant message, whatever reaches another router, take a guess as to what that router does. Well, it doesn't reset it per se. It, and it doesn't extend it, it decrements it. So it goes from 30 to 29 to 28. The idea being that your packet had better reach point B within 30 hops, otherwise the internet, the routers are just gonna assume some things wrong. You really don't need 30 hops to go around the world these days. So we're gonna assume that there's some kind of loop and therefore at that point, the routers are intentionally going to drop your traffic so that the internet doesn't get completely bogged down with just cycles and cycles of mail. Now it might not necessarily be 30 because that does feel kind of low, but certainly attached to a lot of data these days are these TTL's to handle exactly that situation. So what if now your data does reach some router and it is just flat out dropped? Do you now have a get out of jail free card when it comes to saying I did submit my homework by email? Or whatever the important email might have been. So if routers are congested and data is dropped, are you SOL? Not in this class, no. So besides there being these proactive mechanisms for intentionally killing data, there's also mechanisms built into TCPIP that allow for the retransmission of data. So the TCP in the phrase TCPIP actually stands for transmission control protocol, which is just a fancy way of saying that there's some underlying functionality in the internet that quote unquote guarantees that data will get from A to B. Not necessarily on the first attempt, but eventually. Because now again, put on your engineering hat. If you're trying to implement a more robust internet and you just have to accept the reality that some of these routers on the internet will occasionally just drop packets. What could you, the designer of a PC or a Mac or any computer connected to the internet do to kind of work around that unfortunate reality? Yeah, exactly. If you detect you being the author of the TCPIP software, you being the author of whatever the networking software is running on a PC or a Mac, if you're the author of that software, probably there's, let's assume for the moment there's a way of you're detecting if data got from A to B. And if it did, great, you're done. But if it didn't get from A to B, what's the simplest, no most naive thing you could do? Well, just rescind it. And in fact, that is one of the most compelling features of TCP. The reason that it is so omnipresent in like the windows and dialogue windows and a PC and a Mac today is because it's so fundamentally integrated into the infrastructure of the internet. If data fails to get from point A to B for almost any reason other than user error. So email sitting in your outbox, not really TCPIP's fault. There's something else going on there. But once it's out the door and out on the internet, if it gets dropped or it gets corrupted or something else low level happens, TCPIP's peer person life running on your computer as part of your operating system, is to say, hmm, that data did not get to point B, I'm gonna send the exact same packet again. And again, and again, and maybe at some point I'm just gonna flat out give up and tell the user, sorry, your internet connection's clearly down. But that's what you get from TCP, is this reliable or this guaranteed delivery. But now let's push a little harder, how in the world could your computer detect if data made it from you point A to point B? If there's like 30 other computers between you, right? It's not like there's someone on the other end of the line so to speak, because there's so many damn hops between you and that other computer. And it's not like a phone call where you have this open connection. Literally the data's going, maybe pausing, going, pausing. It's not like you're waiting on the phone. So what does the internet probably do whereby you can detect if data got to point B? What's, again, the simplest thing you could do as an engineer here? A read receipt, right? So why, if you just wanna know if the data got to B, why not design the internet in such a way that the recipient B at this very low level, not at the email level, but this low level internet level, why don't you just acknowledge the fact that you got this packet? And that's precisely what TCP also does. In order to implement this guaranteed delivery, what it does in addition to that virtual envelope we started talking about last week, inside of which is your webpage request, or your email, or your instant message, outside of which is your IP address and the destination IP address. In addition to all that information is also what's called a sequence number, literally a number, maybe the number one, maybe the number two, depends on how many packets you've actually sent. And so what the other computer does if it too supports TCP, is it's supposed to respond to every one of the packets that your computer sends it by saying I acknowledge receipt of packet one, two, three, four. And so you, the sender, if you never get back that acknowledgement, what can you do? Just rescind it again and again and again waiting for that receipt. Now there's arguably an inefficiency there, right? There's a lot of crosstalks, so there are optimizations built in whereby smart computers don't acknowledge every single packet, but rather packets in groups of like 10 or 100 and say yeah, I got the past 100 packets move on. So there are ways to avoid what might otherwise be inefficiencies. So if we can just take a quick step back, I did actually find an article regarding these cut wires and this is actually a very good reason why a router might not be able to respond. If one router knows that it has a link between it and let's say this router that exists in LA and it knows that it has a link between it and a router at the far side of the Pacific Ocean, what happens if that is cut? And this was a problem that started happening and it was something like, I didn't read this very closely, but it's something like three or four of these important lines were actually cut and it caused the number of problems in the Middle East. Specifically it seemed to have affected Egypt, the United Arab Emirates, Qatar and also India. And so these people were experiencing what? What do you think they might experience if all of these lines were cut? Yeah, timeout errors or just at least incredibly slow connection times. They were all of these packets that were being sent rather than being sent through, let's say four or five cables, now had to be all piped through one. So if you wanna think of it, again, somewhat like Ted Stevens would in just as a pipe, for example, you're forcing more stuff into this pipe and it's just going to be a lot harder to send. And so I don't think they ever actually solved the mystery of what caused it, but they do have a neat diagram here of some of the lines that exist in the world. And you can see that we have a number of connections between the United States and let's say Japan and China, it appears, and a lot of connections between us and Europe. And so if one of these lines gets cut or if one of the routers at one end of these lines goes down, it's probably not too big of a deal, but if we start losing multiple of them, it is going to be a really big deal because now we have to send all of our data across consistently busier and busier lines. And as the internet grows, they actually have to lay down more lines, as you can imagine, in order to keep these ping times or these trace route times low so that things actually start or things continue to operate quickly. So that when we make a request for some website or for some server over the ocean, we will get a response from it in some reasonable amount of time. So what David was talking about TCP, IP before, or TCP before is that we get this sort of guarantee of a delivery from one point to another. But let's try to tear it apart just a little bit more. If you have to send a lot of data, how might you try best to send it besides just putting the IP address of the destination what can we do to this data to try to make it easier so to speak for these routers to handle? So we could compress it, but is there something else that we could do instead that would make it smaller and easier to handle? Right, exactly. That is one thing that we could do is separate it. We could divide a very large packet into many smaller packets. So instead of having just one large packet, for example, we could just have four. That is the sum, or when you reconstruct all of these packets, it says again the original message. So let's say that I just wanted to send an email to my mom, for example, and I say, hi mom, how are you, et cetera. And if I wanted to send this actual email, one way I could break it up is just to look at the email as a whole and realize that it has some size and just divide it into some even number of packets or just for ease of for the, just to make this an easier concept hopefully to grasp, what I'll do is I'll just divide it into words, even though this is not how it actually works, it actually divides it into more regular chunks rather than into specific words. So now rather than having this one large email to send, I will just send several packets with the word hi, the word mom, so on and so forth, and I will actually address each of these packets to the destination IP address and send them along the way. So what are some other things though that we might have to add to these packets in order to make it reasonable or for the receiving server to know what to do with them? Yes. Right, we have to, the receiving server has to know the order. So the best way to do this would just be to literally number them. So this is the first packet, this is the second packet, and so on and so forth for however many packets I have. But what else might be useful information that these packets should have? So I said the destination IP address and it would be the same in each of these cases. But if we, well, let me just talk about this for a second. What is the destination IP address? Would it be the IP address of my mom's computer if I was sending an email? Right, exactly. So this destination IP address is different from the email address of my mom in that what I am sending this packet to is my mail server. So this SMTP server, for example, that is going to handle relaying the email as a whole to the proper server that my mom would then connect to and download the email to read. Okay, so we have an order number, a destination IP address, and what else do you think we might need? Anything else? So what does the mail server have to do in order to notify us that it has received all of the packets in an email or that one packet was not received? Right, so maybe not the MAC address but at least the sender's IP address so that the server, when it receives all of the packets, will then be able to notify me that it has received all of the packets or it will be able to notify me that it is missing one of four or two of four of the packets. So one last thing that might be useful is to tell the server how many packets we are actually sending it. So this might be one of four, this would be two of four, this would be three of four, for example, because then that way, if it receives, let's say three packets, it would know exactly how many it should receive after that. So in this case, it knows that if it receives these first three packets, it would know then that it has to receive the fourth still before it can reconstruct this email or put it all back together and deliver it to my mom's email address. What's the point of all that? Why bother going through all these hoops to chop an email up into like four pieces or more pieces when you could just send the whole thing at once? What's the answer to the who cares question here? Like why bother? Speed, how so? The size of. So the size is smaller, so individual packets go faster, but you probably can't use the email for anything useful until it's all arrived, in which case it's still gonna take as long in the aggregate, right? It's a close. That's a good thought though, it's close. Sure, sure. So what about the protocol though? It is, well so yes, it is. And part of, so the IP half of TCP IP is precisely what handles what's formerly called fragmentation, where you chop up big pieces of data into smaller ones. And so I'm kind of picking on you a little too much because it is speed, but, oh that's good actually. Yeah, no, that's a good idea actually, right? The small, if you sort of make smaller the pieces of data that you're sending and you do have some corruption because of line noise or some kind of electrical interference or just whatever goes wrong, ideally you have then less data to resend. And so I mentioned speed before, so why might it be who've not just you, but other people to chop your data into a whole bunch of smaller pieces, right? Suppose that we're not sending an email but you're downloading some really large illegal movie, right, it's like 500 megabytes and you're downloading it off of BitTorrent or some other website, the whole darn like movie file or MPEG4 or whatever topics we'll get to in our multimedia lecture. Why might that annoy other users? So you're eating a bandwidth and if you're downloading this really big file and it's won a huge flow of bits, that might very well be to the exclusion of other people using the internet at all, right? So Ted Stevens gets picked on a lot for that comment about I didn't get my email for three days and it's because there's all these dump trucks in the internet clogging things up. I mean he had sort of the right idea there that data can back up, but the reason that IP and similar mechanisms actually chop up your data is so that you actually have a more equitable use of the medium. So by chopping your email up into four parts and everybody else's email up into four parts, that means you're not sending things in a queue whereby the first person to send mail gets there first even if it's huge, thereby backing up everyone else, but rather you're sort of sharing things more fairly where the first guy's email, a quarter of it's gonna go, then the next guy's gonna get a quarter of his email to go, then the next guy, then the next guy, then it's gonna repeat so that in this way even if you sent your email last, but it's the smallest of all of the emails, right? You're the only one without an attachment, your email might in fact get there first because you're getting an equal share and equal slice of the pie so to speak by having these packets fragmented in this way. So you don't in fact have as Ted Stevens alluded to backups on the order of three days long as to why his email is not getting there. That frankly is more about should have checked his out box, should have talked to his sister, man, it doesn't boil down to these low level specifics, but certainly the idea is a fair one and in fact he did have in fairness a lot of the right ideas in terms of talking about the internet as tubes and a series of pipes interconnected. What I think is laughable and sort of fair game with him is that this man is in charge and part of regulating all this stuff and his metaphors were not sort of suggestive of mastery of the topic trying to simplify it for the masses but perhaps having had it simplified for him by someone else and he's now regurgitating this stuff. So I think that's why we pick on him not so much because the ideas there weren't themselves reasonable analogies. That's a good point, but here's another idea about the moving on off of Ted Stevens. We're done with him today. But here's another idea for why we might break up these packets. So we're talking about how routers select the best route for these packets. And so let's say that I only had space for three so instead of four, allow me to only talk about three packets now. But let's say that I wanted to send these three packets to my SMTP server. And so what I do is I send them first to, let's say, my home router which sends it then to Comcast Router. So this is router over at Comcast. And it sends these three maybe in order, maybe not. It doesn't really matter because we can reconstruct it later. Then what it's doing is the Comcast Router will send, let's say, the first packet to router A. And then the second packet to router A. And then first packet gets sent to the SMTP server. But then router A goes down for whatever reason. Let's say that someone literally cut the cord between the router and the server. Or let's say that the router just got too busy now. And it's not able to do anything. So this other router that was originally passing this data can now make a decision, OK, this for whatever reason has gone down. I should try to send the remaining data through another router. And so it will then send packet C through this second router here which will get passed to the server. Finally, then the server will realize, oh, I've only received two of the three packets. It'll send a reply back to my computer. My computer will say, OK, I'll go ahead and send you back the second packet which will then get routed properly. So there are a variety of reasons why this would be helpful, not the least of which is if some router actually goes down right in the middle of some transactions, such as this, this packet being sent, or these numbers of packets being sent, but also in that, like what David was mentioning before, if you're sending huge amounts of packets. So not just a little one, two, three word email, but instead let's say a movie, for example. Now you will be able to send these packets through different routes. Rather than if you're sending a movie, it gets dropped at this router, for example. You don't have to send. Your computer doesn't have to resend the entire thing. It can only send the packets that it needs to. So it, sorry, go ahead. Well, I was just going to say, so it's just a lot more efficient. Oh, OK. So why all this duplicity of paths, right? So if Dan's commenting here that you can route around routers that are down, because the data can follow a new path, that seems to be an expensive solution to this problem, right? Why not just have A path from A to B? So wherein lies the origin of this mesh of connections that really exists out there, whereby you have a whole bunch of routers, and they're not really wired together in series, like this guy goes here, here, here, here, but rather there's really a lot of interconnections so that, in fact, problems can get routed around. Anyone know why that is, in part, the case these days? What the origin of this cleverness is, if inefficiency? So definitely these days, the more the internet grows, the more additional networks are kind of patched in together. And it tends to make sense not to just patch them into one location, but actually to connect them to multiple points, precisely for the reason if one point fails, it's ideal if you can bring the other back up alive. What about the origin of the internet? So never minding claims of who invented the internet on an individual level. What are some of the origins? What is the origin of the internet? I'm going to take over before you get mesmerized by that little fractal. Who invented the internet, really? Besides, who was it that took credit? Was it Al Gore? He's coming to campus, actually, Wednesday, if you'd like to say hello. Yes. But the origin of the internet. Yeah, so there was that kid, but who laid down the physical infrastructure? Military, right? Right, so yeah. So there was this precursor to what we know as the internet a few decades ago, known as ARPAnet or from DARPA, the Defense Advanced Research Projects Agency. I might have butchered a couple of the letters in the acronym. But the origins of the internet were largely militaristic, whereby there were different sites across the United States. They wanted to be able to share data in some form. And this was before there were web browsers and instant messages. The protocols were fairly simpler. And it was about sharing data. But one of the fundamental design decisions way back when that has carried forward today is that there really is no single point of failure. Sites A and B are not connected, typically, by design via just one link, via one possible path. And the reason for this was because it's good military design. If one of your sites gets knocked out by whatever threat or attack, you can literally route around it. Because again, you have this inefficiency. But this resilience, this fault tolerance built in by enabling there to be multiple paths from A to B. And one of the neatest things about the fragmentation that Dan was describing is that, in theory, though this isn't always the case, if you're sending an email or you're downloading a big file, it's quite possible that that email is not going to go via the same path. But rather, it might, in fact, get forked off into four pieces, three pieces, 16 pieces. And actually, those pieces can, in fact, go different ways. In fact, it's so damn clever this setup that the last part of your email, your signature, can even get there before the dear mom part of the email even gets there. Because one of the other purposes of IP is to reassemble those packets, not just at all, but in the appropriate order, thanks to the sequence numbers and the numbering schemes that Dan actually alluded to. But there's one interesting design question here that's worth picking on. Because we've sort of preached just how amazing TCP and how amazing IP are because they get data from A to B, they handle fragmentation, and therefore, they let big data and small data get through sort of with equity, and they guarantee that the data will get there, TCP. But can you imagine instances where you'd actually rather data not get from A to B with a guarantee? In other words, are there certain applications, certain contexts in which you know what, you'd kind of rather your computer not bother trying to resend data if it doesn't get there the first time? National security questions. How so? In what? Interesting. So maybe, maybe, if there are interesting national security implications, I'm going to sort of cop out with pushing harder on that and just say, let's focus on something more technical, only because I don't know how best to respond to that. Okay, interesting. So anything, that's actually hints at the bigger picture here, anything real time. So maybe something related to stock quotes whereby you actually care about transacting the deal now or maybe not at all. You don't want to wait five seconds or wait a minute for the data to come again because the price might have changed and your interest might have changed. So that's an interesting idea. And why don't we continue that theme? Other real time applications where you'd kind of rather your computer not bother going backwards in time to resend data, but just let it plow ahead and forget about whatever data was dropped. eBay. eBay, how so? Okay, interesting. Yeah, so maybe if the auction is going on for another 60 seconds, who really cares if you missed the second number 23? You really just care about the one after it, for instance. So that might be the case. Turns out doesn't really apply because HTTP, the protocol used by web browsers and servers, actually by nature uses TCP. So nice idea, but the web itself does not allow for such behavior. In fact, even though there is TCP, there are alternatives. And that's where we're about to go with this discussion. But a lot of these high level protocols that we already talked about last week like SMTP, POP, IMAP, HTTP, almost all of your favorites actually do require or at least use TCP. But there are a few that many of you have probably used that don't use TCP because again, they don't care about guaranteed delivery. What else comes to mind in terms of real time? Like that's the key hint here. Nice, nice. So voiceover IP, for instance, the use of the internet to have phone conversations. So it's ideal if you don't lose entire words or sentences that your partner on the phone is speaking. But if you just lose a little like a blip, a split second, you can probably figure out from context what they said or you can just ask the person to repeat it. What you probably don't want to have happen is that the other person is five seconds behind in the conversation than you are. In fact, this already happens today for different reasons. If you're on a cell phone from here, say to someone on the other side of the room and you watch each other's mouths, like they will not be in sync with what you're getting in your ear. Not so much for the same reason, but in that case because of transmission delays and because of encryption reasons, it actually doesn't transmit instantaneously. It's kind of a weird thing, right? Next time you're walking down the street up to someone and they're talking to you, the mouth will not be in sync with what you're hearing. It's really bad if it's out of sync on the order of seconds and not milliseconds. So the other point you made was streaming video. Why would I maybe rather that streaming video not guarantee delivery? What's, can you elaborate? No, no, no, that is the answer. No, that's a good one. But why, what do you mean? You don't want to get the end before the, well, so doesn't that suggest that you do want it to pause and go back and get the bits that were lost? Oh, okay. Okay. So streaming video, much like streaming audio, like VoIP, which it effectively is because it's this real time transmission of voice. You know, maybe you want the video just to forge ahead. Maybe you'd rather the screen just jitter a little bit, but forge ahead because otherwise you get that sort of famous buffering, buffering, buffering and maybe you don't really care about that, just forge ahead, especially what types of online or what types of streaming video do you probably, if you're a real nut, really not care, so there's at least certain context I can think of where I'd really rather the video forge ahead. What kinds of videos? Sporting events, baseball games, right? Kind of annoying if you're watching a game online, maybe even you have some bets placed and all of a sudden your sort of experience watching the game starts to get more and more out of sync with reality, right? If for whatever reason, so you might rather just lose the past couple seconds, but let's keep up with the game because there's something to be said for keeping up with the game in real time. How about Skype? Skype too, so Skype is, yeah, no, absolutely. So Skype is used for VoIP communications. It also does support, I believe, video communications and even if the technology's imperfect, even if you drop a few bits, so to speak, maybe you'd rather the protocol just continue. So there is in fact this alternative to TCP that's called UDP. UDP is same idea of a protocol. It gets data from A to B, but it doesn't retry if it doesn't actually get there. And UDP is very commonly used for things like streaming video and streaming audio and even DNS, actually, where the protocol, let it be up to the user. Let it be up to the software to actually go back in time and send the data again. The protocol itself is not going to bother. So it's sort of just an alternative to TCP that doesn't try as hard, but sometimes you'd rather that. Why don't we go ahead and take our five minute break? Okay, welcome back, everyone. So just before the break, David was talking about this other protocol that we hadn't mentioned up until now and it's UDP. And so there is distinct difference between UDP and TCP in that when packets are sent via UDP, we really don't care if one or two is dropped. That is that it's probably not very good for, say, an email because we want to be able to get all of the packets and reconstruct the emails we had originally typed. But maybe for some things like voice over IP, which are more time dependent or streaming video or streaming audio, which require that we get all of these packets just in the right time rather than being able to reconstruct it exactly as it was originally intended, then we are going to use UDP over TCP. And so many times you will hear of the acronym TCP IP, which indicates both or these two different protocols of TCP and IP being used at once. However, they are actually two separate protocols where TCP really handles, really is the thing that guarantees this transmission of the data and that it will be received, maybe not in the original order, but it will all be received and that the receiving computer can then reconstruct it in the intended order, whereas IP as is from the moniker IP address is actually used just for addressing. So we can use TCP IP to send data which uses both the TCP protocol and the IP protocol. However, we could also use UDP, which will just send all of this data and we would have to address it using IP. So just because we've always seen these two acronyms together, TCP slash IP, doesn't necessarily mean that they are one and the same. They are actually two distinct protocols that we should consider. And so that's why it's at least important when we're talking about UDP to understand the difference because UDP packets do still use IP in order to address and to send the data over the internet. However UDP and TCP, the main differences there are just how the data is restructured or how the data is packaged in these packets that we have been talking about. And so now that we have a better concept of some of this lower level stuff, I thought it would be good now to talk at least in a little bit more detail about how these DHCP or DNS services work. And so last week, we sort of talked about how DHCP does what? Right, so that is how you would be able to get an IP address from your ISP. More specifically, you'd be able to do it dynamically. So when I bring my laptop in here and I connect to the wireless network, I'm given an IP address by some server on the network. However, when I close my laptop, bring it home, reopen it there, then I will get another IP address because of DHCP that would be dynamically assigned. And so this protocol actually does work over using UDP, but what it does is it sends, when your computer just sends out a broad request, it basically sends it to an IP address that is very, very high, probably something like 255.255.255.255, so those remember the IP address is the four numbers. And that is generally known as what's called a broadcast address, which will the router that receives that number that receives a packet that is destined for that number will then send it to every computer on the network. And so it's sort of the same thing as coming into a room such as this and just announcing I need an IP address and whichever one of you happens to be the server, the DHCP server will be able to respond to me. However, if I came in and I had a message that was destined just for somebody, I could reference that specific IP address, which would be akin to just putting a message in an envelope, putting the proper address in it and just handing it to the correct person, for example. So there is a little bit of a difference in that we don't necessarily address a specific computer because we don't know what the DHCP server's IP address might be. We just sort of broadcast this message to everybody, or excuse me, we broadcast this message to everybody on the network and hopefully one will respond correctly. So notice that I said correctly, there are a number of things called DHCP, these rogue DHCP servers and these tend to be a problem, not so much in home situations or home networks, but more on campus situations such as this where somebody might be tinkering with their computer and let's say they're on a network in a dorm. Suddenly they somehow enable a DHCP server on their own computer. Now when somebody else plugs in their computer to this dorm network, they're going to broadcast this message of requiring an IP address and what might happen. Right, well it really depends which responds first, but there is this very distinct possibility that this guy who's tinkering with his computer, his DHCP server will respond instead of the proper DHCP server. And so depending on how this is configured, you can have all sorts of problems. It will assign it the wrong IP address. It might assign a duplicate IP address and remember we talked about how that's bad because two computers on the internet shouldn't have the same IP address. And so this actually is somewhat, well it's not maybe that big of a problem in the real world but it is a big problem in dorms and many times they will actually, if they find that you are running a rogue DHCP server, they'll have to shut you off because this actually causes quite a large problem. So the reason that this isn't such a big deal, let's say on the home network, let's say you have a cable connection to your ISP, let's say to Comcast and you run a DHCP server, most likely Comcast is smart enough to not send all of these broadcasted requests back to all of the home users. They'll be able to catch them and send them just within their own network to their own DHCP servers. However, if you're running a DHCP server, they most likely will shut you down anyway but that's beside the point. The point is that these DHCP servers, although you can certainly set them up yourself, it isn't necessarily always a good idea. However, on your home network, you might want to set one up yourself if you want to be able to have all of these computers on your network obtain their own IP addresses. Luckily, this service is provided by all of these home routers that we buy. So these home routers do a lot more than just route information. They also contain, like we mentioned last week, a DHCP server which acts to provide all of the IP addresses within your home. Otherwise, you would have to have a separate computer that's always on just giving out IP addresses on your home network. And so, there are some other things as well that we could go into just a little bit more detail and that is DNS. So what is DNS? Right, so the domain name server. So it's a service which essentially acts like an address book on the internet or like yellow pages where when we want to know what the IP address of say Google.com is, we go to the DNS servers and they open up their very large yellow pages, so to speak, look up Google.com's IP address and sends it back to us. However, the idea that we presented last week is very, very simplistic. There's a lot more levels of hierarchy in this DNS system that can occur. And what we were talking about is the DHCP server can actually reply to your computer not only with the IP address that your computer should use, but also the IP address of the DNS servers or the IP addresses rather of the DNS servers that your computer should use when it needs to look up the IP of some particular domain. But every, so you can imagine that there's a lot of DNS servers around in the world, so how do we know that all of these DNS servers have the right information at all at the same time? So if we are to borrow a sentence from David's book and say if you put on your engineering caps for just a moment, how might you do this if you needed to make sure that all of these DNS servers actually have the same information up to date? Any ideas? Yes. Right, so they might talk to each other and that is correct. They do communicate in a certain way, but at an even higher level, how might we, how would the servers know what to talk to? So how would one DNS server know to talk to, how would one DNS server know which other DNS server to talk to? So we might have this idea of a hierarchy. So sort of like when you, well, let's see, that might be a poor analogy, but so I know that it's tough to see the board from the other side and I apologize. I'll try to make it as large as possible, but when we are talking about DNS servers, we actually have this hierarchy of servers that exist. So let's say that this is my computer right here and I am going to make a DNS request to that DNS server that my ISP said is the DNS server that I should use. So remember, when I first connect to the network, I contact my ISP and they reply, I contact my ISP using DHCP, they reply with my IP address and the DNS servers that I should use. So now when I want to look up Google.com for example, what I will do is I will contact the IP address of the DNS server that they provided. So let's say this is an ISP's DNS server. Let's see, DNS of ISP. I will ask that server what the IP address of Google might be. Now it might know what the IP address is, but maybe for some reason it doesn't. So what does it do? Well, remember that we've divided these domain names into several sets. So we have a subdomain, a domain, and what is the last bit, the .com, the .edu? What is that called? Right, exactly, top level domain. So this DNS server knows that there exists other DNS servers that might know more information and specifically it might go to the very top of the hierarchy. It knows that since I'm requesting Google.com, it should ask the top level domain, this .com DNS server, what the IP address might be. So it passes another note over to the .com DNS server, which replies it might or might not know. So it would apply, well, it shouldn't know at that point. It will reply with Google.com's IP to this DNS, to the ISP's DNS server, which will then respond back to me. Now the reason that this works is that the DNS, the ISP's DNS, would then be able to remember that number. So it might have some expiration dates and might say that, okay, well, this number might change the next 48 hours, why don't you check then. But for all of the people that connect to this ISP's DNS server, as soon as it's found out the most recent IP address from Google.com, it can remember that, it can cache it. And so now all of the requests that come in for Google.com from all of the ISP users will be a lot quicker. But it's this sort of hierarchy that exists that allows them all to remain in sync, but there is this sort of time delay. So remember that I said that this ISP's DNS, so Comcast DNS server, might remember what Google.com is for the next 48 hours, but what if that IP address changes during that 48 hours? Well, this ISP's DNS server reports the wrong IP address. And so one of the things that you'll see, especially when we start talking about actually obtaining your own domain and pointing that domain using DNS to our hosting, is that it will say, well, okay, that's fine. You are now, your domain now points to this IP address. Please allow 24 to 48 hours for this to take effect. The reason for that is this hierarchy is that the top level DNS servers might know the IP address of your domain, but it may not filter through to these other, these lower level DNS servers, such as the ISP's DNS servers or Harvard's ISP servers or any number of DNS servers that run on the internet. They may not know what it is, or it may store an old version of the IP address, and it must ask this top level domain what it might be. So how might we change this diagram if we're talking about a subdomain? So I talked about Google.com, so let's move away from that. Let's say that we found what we were looking for on fast.harvard.edu. How might we have to change this diagram if my ISP's DNS server does not know what fast.harvard.edu points to as an IP address? So the most obvious one is to change this .com to a .edu. So now instead of asking the .com top level domain DNS server what it is, we're going to ask the .edu. So now it can say, for example, if we're looking for fast.harvard.edu, it might say, well, I don't know what fast.harvard.edu is, but I do know what harvard.edu is. So it returns to us the IP address of harvard.edu, or more specifically the IP address of the DNS server for harvard.edu. And then we can ask harvard.edu, their DNS server, what is the IP address of subdomain fast.harvard.edu? So now this .edu DNS server is replied with the IP address of harvard.edu's DNS. So we then ask it, what is fast.harvard.edu? And it can reply with the proper one. So the reason for this is that, well, we can have this hierarchy. So the top level domain doesn't necessarily care about all of these subdomains that exist. It might only care about the actual domains themselves, and we can manage here on Harvard's campus our own DNS servers that we can change subdomains at will without having to contact any of these higher level domains. Any questions on that stuff? Anything you want to add? So that's a lot of power if your ISP controls these DNS servers, right? If you are looking up a website, whether it's cnn.com or anything.com or any other TLD, I mean, to be clear, who is it that's responsible for giving you back an answer more often than not as to what the IP address is for that server? I mean, it's Harvard's DNS server if you're here or it's Comcast's DNS server if you're at home. And this actually has had really interesting implications over the past several years. There's been an occasional story where an ISP has decided to take it upon itself to do, what do you think if a user mistypes a domain name when trying to visit a webpage? I mean, with this power, what could an ISP do if you're trying to type cnn.com and you goof in type cmm.com, for instance? I mean, typically a browser would do what? Like what are you used to in those cases when you screw up and you mistype a URL? Unable to find page? Yeah, unable to find page 404, error not found. I mean, some kind of error, but you don't get the day's news. You don't get whatever it was you were looking for. But if your ISP controls these DNS servers, what could, in theory, they do? Yeah, exactly, to summarize for the camera. So steer you somewhere that's advantageous for them. And in fact, there have been a couple ISPs over time that in the event of a typographical error or simply you're trying to visit a domain name that does not exist, they return you to something.comcast.com and guess what's plastered all over the screen? Advertisements, yeah. Yeah, no, so it's been this really interesting issue about the power that ISPs have over the internet and their violation, potentially, of what the spirit of the internet is supposed to be. Some ISPs have taken this one step further and it's not directly related to DNS, but there have even been some ISPs and this has even received political attention because whereby in theory, an ISP, because they are the one sending the bits from the internet to your computer, they can look at those bits certainly and they certainly have logs and this kind of stuff can be subpoenaed and this has always been the case and we'll talk about that kind of stuff in our security lectures. But if they also have access to those bits and thus the web pages that are being sent down to your browser, what could they do? Besides extracting information from those bits if they wanted to get really obnoxious or greedy. So they could corrupt it, but that might not be in their best interest, because that's when you pick up the phone and call Verizon instead. So corrupting your data is probably not in their interests. Toward this end of monetization. I mean again, think about, so we've given you, at least tonight, some of the basic building blocks here and the tools with which you can build interesting applications on top of the internet, but with those capabilities, with those building blocks come some risks, come some undue powers perhaps. So you're the ISP, you're thus responsible for sending the bits down to the user's computer. What could you do to those bits that might make you an extra buck? Sorry? So you could steal them, but you already own them really, they're already yours. So okay, so that's true. So definitely if you have some malicious ISP or some malicious employee within the ISP, you definitely might wanna worry about that you're online bank transactions or whatnot, but what did we say mitigates that in part at least two weeks ago I think? So encryption, when you have URLs that start with HTTPS and SSL in general is the protocol used to secure web page transactions. In theory, you could still have a bad guy in the middle doing bad things even with your secure connections, but in general SSL and URLs with HTTPS, even though the bits do flow through the ISP, all they see is the scrambled data going back and forth. So with high probability, can no one get into that particular data? But suppose that these are just basic web pages. You're pulling up Google, you're pulling up Facebook, you're pulling up CNN.com. In theory, your ISP could just insert a little bit of what's called HTML, which we've mentioned before and we'll definitely spend some time on in a couple of weeks, insert some HTML into the web page and guess what that HTML might contain? More ads, right? So this too has been a really interesting issue as to what is appropriate because technologically all of this stuff is possible and you can infer all of these technological capabilities just from some of these basic building blocks. And if I dare say interesting exam questions, right? You sort of, if you understand how the bits are going from A to B and whom they're flowing through, I mean these are interesting kinds of sort of thought questions you can have. What is actually possible? Well certainly it's possible for additional bits to be injected that happen to collectively represent some advertisements. So tangential to DNS to be sure, but similar in spirit to some of the abuses perhaps of these technologies. And in addition to that, so instead of adding bits, what if let's say Comcast had some sort of a deal with Google where all of these bits that are destined for Google are somehow prioritized? Let's say that now there's, they form some sort of partnership now where all requests coming into Google from Comcast will be sent just sort of sent to the beginning of this queue. Remember that we were talking about how packets can be queued in a router, for example, to be sent and what if they're somehow ordered? So now all of the ones destined for Google will get sent there first. What can you, what might happen then to all of these other sites, all of these sites that are a bit smaller that may not be able to afford these really fancy prioritization deals or these partnerships with these ISPs? They get pushed to the bottom of the queue. So now all of a sudden somebody that tries to visit my website, for example, or even computer science one, we don't have the sort of funds necessary to make these multi-million dollar deals with one of these cable companies, we will not be able to get packets from you guys as quickly as you normally would. And so this brings up actually, this is a big problem especially last year, one of the big buzzwords that you were always hearing about was net neutrality. And that's exactly what this is, was that being able to neutrally just send all of this data as it comes in rather than reprioritizing these packets that were destined for some particular company that may have a partnership. And so a lot of people say, well, this is a very good thing or a lot of companies specifically say that this is a very good thing. Now you'll be able to access our services much more quickly, which may or may not be true, but what happens to all of the small websites that had existed or they may be squashed, if I can answer my own question, they just won't be able to receive packets as quickly and users may become frustrated. And so certainly what happens when you guys try to visit a web page and it just takes forever to load? One of the things I try to do is, yeah, so you might move on from it, you might try to refresh it real quick, you might give it another chance or two, but if it just does not load, you might move on to something else. And so somebody that has a web page that actually deserved your time to look at it may be taken over by some of these other packets that were taken your spot, that had taken your spot in the queue. And so this is certainly one view on this Net Neutrality Act, but that's really essentially what was going to happen. It had it actually passed. And I think that it always seems to come up every few months or every other year or so, and luckily it always gets shot down. But if you remember that video that we showed from Ted Stevens, it was actually referencing the Net Neutrality Act. So just this idea where partnerships could not be formed that would allow some packets to be reprioritized over other packets. So let's try to take some of these building blocks and flip it back on you. So you go home tonight and you decide to get on that internet and send someone an email. You hit send or, you know what? Well, you already did email. Let's back up. So you decide that you wanna check the latest scores for some game or you decide you wanna check what happened in the episode of, what's tonight, Monday? I don't even know what's on TV Monday nights. What's that, heroes? Okay, so you, I'm sorry. Interesting. Okay, so you missed the episode of heroes, you don't have Teavos, you gotta read about it on the internet or download some video that someone's made of it. So you sit down, you click the link, and bam, connection lost. Like what do you do when your internet connection is broken for some definition of broken? Yeah? The lead system 32. What's that? The lead system 32. The lead system 32. It just hosed the whole operating system. All right, so that's the extreme case. Just rip, so that's tech speak for break your computer more. What else could you do? Yeah. What's that? So I was just playing off, so geek speak here. There is a very important folder called system 32 inside of the Windows directory of a Windows computer. Very important files are in there. So your wonderful suggestion was that's now gonna confuse the whole class was to go in and delete this folder there by making the problem worse. Lost all credibility now. Go on, go on. Can I steal the focus for a second on the screen? Yeah, I think everybody's sick of this diagram. So yeah, let me pull up PC here just so that you can see. So while you're bringing that up, I guess I can explain it. So IP config is an application on Windows computers that allows you to basically perform some IP operations on your computer. And one of them, one of the options that you have is this release and renew where release is you are essentially getting rid of this IP lease or this IP address that you were given by a DHCP server. And by giving it another command called renew, you are asking the computer to re-send out this broadcast message to all of the servers and ask for a new IP address. So I'm gonna go ahead and run IP config. My computer is crazy geeked out with way more configuration details than a normal person has. So I'm just gonna scroll up to and home in on what's actually relevant to a normal person. And that's going to be, let's say, this stuff here. So what I've highlighted in white here is apparently all of the TCPIP related information for my wireless LAN adapter wireless network connection. So these are all now familiar terms, presumably. I seem to have a connection specific DNS suffix. Well, that just means what domain am I in. And that's something that was automatically assigned to my computer, take a guess by what kind of server. Not DNS, but DHCP. So a DHCP server dishes out IP addresses and also details like this. In fact, the IPv4 version four, which is what most computers still use today, though the world is very gradually changing to what's called IP version six, which we did mention two weeks ago, is my IP address, 140247228.79. My subnet mask is useful for routing. Often is 2552552550. And my default gateway, what's a gateway? Exactly, so a router is otherwise known as a gateway. These terms are synonymous. So in order for any data to leave or get to your computer, you yourself have to be connected by a one hop to a local router, most likely your ISPs like Comcast or in this case of campus, Harvard's nearest router. So Harvard might have a whole bunch of routers, but one of them is most proximal to me. And so one of the other things DHCP does is proactively inform your computer, what is its first hop? So if we were to run trace route from my own computer, which if it worked, if Harvard weren't filtering, the very first hop would be what IP address? That guy, right there. So useful information still, but you suggested now a diagnostic technique. So many of you probably have windows at home, you can't really break your computer, when in doubt, just restart your computer. But I'm going to run IP config and not just hit enter, but the way you often control the behavior of programs like this and windows that are just boring programs that the command line, so to speak, textual based, is to do a forward slash and then an option. And in this case, the option is, as Dan said, release. And what this is going to do is release your so-called DHCP lease. All of the details that you were dished out by the DHCP server are now gone. So now I've gotten rid of all that information. And if I scroll up for a moment, notice that next to my wireless LAN adapter, guess what happened? I'll go ahead, whoops, whoops, all right, end of demo. And notice if I go ahead and run IP config again and scroll up to the relevant lines for my wireless adapter, notice that this has happened. So this is the same information, but the numbers are different now. So notice it's kind of gone back to some defaults. In fact, one of the rows is blank, but notice that one is this 169.254 address. If you ever see an IP address starting with 169, that means your computer's still not yet working on the internet because this is what's known as an auto configuration address. It's the address your computer just takes on by default before it has a useful IP address. So if you notice this, this means you haven't gotten a DHCP lease, you haven't been given an IP address. And so what did Dan say was the opposite, so to speak, of IP config release, renew. So if you take nothing else away from at least a hands-on level tonight, IP config space slash release and renew are perhaps the two most useful commands you can know. In fact, frankly, if you have someone from Comcast come into your home and diagnose some network problem, these are among the two only commands that they're ever going to type on a fairly technical level to try to diagnose the problem. And it's pretty equivalent, frankly, just rebooting your computer. The same thing happens, a renew happens when you reboot your computer. So let me go ahead and do this. So renew, enter. So I'm getting a couple errors because I don't have cables plugged in for my other network cables. So that's fine, it seems to be taking a little moment here. And I'll concede that Windows, both XP and Vista, they're a little buggy when it comes to this stuff. Frankly, sometimes the only solution or the best solution is reboot the computer. I mean, I do it, Dan does it, these computers are imperfect. This time it did seem to go through. So let me scroll up, even though it took a moment. And now I seem to have some valid numbers back. They might be different. I don't remember what they were a moment ago, but that's the nature of DHCP. But now I'm back online. So the reason you might do this is, frankly, just something went wrong. Maybe the DHCP server died, your network connection died, and therefore your DHCP lease expired, but your laptop hasn't realized it yet. In short, this is just one of the tricks that should be in your toolkit. But again, rebooting can often fix these things as well. But rebooting is sort of the pre-E1 solution. Now that you've been in this course, this is what sort of true savant do. So a reboot essentially just does the same thing as an IP config renew release. You wanna switch it over to mine? So just because we are platform agnostic here, I figure I would show you essentially the equivalent on a Mac. So we do have this idea of a terminal which is similar to the Windows command line. And you can usually find it in your doc or you can find it more specifically in your applications and utilities. But here, if you type in IF config, it shows you a slightly different layout, but essentially the same information. The one that we are interested in here is EN0. You'll usually see a whole list of different ports or internet or NICs as they're called, network interface cards that might be listed here, but the one that we are interested in is EN0. Other ones might be ETH0, ETH0, ETH1, et cetera. And anyway, this gives us a little bit more information but a lot of the same stuff. So you can see that there is one on the second line, INET 140.247.228.18, which we can infer. So it's very similar to David's is my IP address on this computer. Now, David mentioned the idea of subnet mask and we have that here as well, but it's been converted to hexadecimal where it's in a series of Fs and zeros. So we don't really have to worry about that, but there's a broadcast address which the Windows computers don't show, but it doesn't really need to. It's essentially just a mesh of the IP address and the subnet mask. And so remember that when I was talking about this sort of this broadcast that's going to occur when I need to broadcast some information, that is the IP address that defines this broadcast where if I'm going to announce that I need some, if I'm going to announce to every computer that I need some bit of information then I would use the broadcast address, yes. Is there other information you can get from the command? Oh, what else can you type in there? Well, there are a lot of commands that you can type into the command windows to get more information about the computer. Typically on a Windows machine for a long time, you could get the most amount of information quickly about your IP setup by using IP config. However, a lot of other information you can more easily gather from other aspects just in Windows itself. Like if you go to the system properties, for example, you can get version of Windows and how much memory your computer has, et cetera, et cetera. And if you go to the, I forget, what is it called? The device manager, I think it's called in Windows. You can get still more information about the parts inside of your computer. And similarly, I'm just showing here the text-based representation on a Mac just to compare. However, you can get just as much information. I believe you can get, let's see, I actually haven't worked with Vista because I'm scared of it, but I think now you can go to, there's probably GUI ways, or by GUI I mean this graphical user interface with Windows and such where you can access the same amount of information. However, it's just sort of a mainstay now to retrieve all of this information via the command line. Anyway, when on a Mac, it's usually easier to get all of this information and manipulate it by going to the system preferences. You can see that we have now, I did it all without talking about it. If you just go to the network and now you can see that I'm connected via airport which is Apple's fancy name for a wireless connection. Click on Advanced and under TCP IP, you can see that I have pretty much the same information if not more. And so now I see very similar now to what David showed before. The version four address, or the version four of IP address, subnet mask, the router or the gateway since they're essentially synonymous. You can see that I also have a funny or a fancy renew DHCP lease button which does exactly the same thing as release followed by renew. Or if I clicked it, you would get rid of my IP address for just a brief second while it contacts the DHCP server and gets another IP address. You'll notice though that the DHCP server was pretty smart and realized that it's the same computer. So it gave me the same IP address without too much of a hassle. We do actually see additional information as well like DNS. Now here we can see that they have given me three DNS servers to contact along with the same search domain client.fast.harvard.edu. Let me go ahead so that you can go home those of you with PCs at least and feel like you're finally owning your machine and taking control of it. What I did to get to that black and white prompt here which you can do as well if you have XP or Vista go to your start menu go to run and then go ahead and type quite simply CMD for command. And what that will bring up is this command prompt so to speak this black and white window and notice you can only type at it. So there's not really a mouse interface for the most part but the three commands that I'd propose you play with tonight or tomorrow just so that you actually feel like wow you're finally doing some neat things with this if arcane is three. NS lookup for name server lookup which we used a couple weeks ago to look up IP addresses for domain names. Ping which sends a special type of packet from you to a server and it tells you how many milliseconds it took for that packet to get there. So in fact, trace route uses ping to get those numbers that we saw earlier. And the last one on windows is an intentional misspelling. It's trace route just trace RT and then again the name of the thing you want to trace and again provided your ISP doesn't have annoying restrictions like Harvard's network does otherwise I would simulate these again on my own computer. You'll see the path from A to B and if you see a bunch of asterisks that indicate that indicates that there's something going on that's getting in the way of answering the question where are these hops. But if you guess a few different host names you'll often find a full path like we seem to tonight for the dot JP address. Well it's not trace route as it was when I typed it before in windows it's just trace RT. Yeah. There are spaces so to be clear there's spaces there. Yeah. So all of these commands you can also find in Linux, Unix and Mac computers if you open up a terminal window like we just had done. Yeah and actually let me make one mention of one thing. Those of you with Vista, when I said go to the run prompt here you might not have it by default in the interests of sort of simplifying things Microsoft took away some of these features. So what you might want to do instead if you have Vista is hold the windows key the one that has the little windows flag and hit R for run and you'll get that prompt as well. So that's the alternative way. But what? Okay. Okay. Can you switch me? Just to steal this back. So you can actually access all of these same commands. So there is also ping so you can do ping cnn.com and again we are being blocked. As you can see it says that we've transmitted three packets. We receive zero packets so we have 100% packet loss between us and cnn.com. However it doesn't necessarily mean that the packets are being or that well it does mean that these packets are not being sent but it's specifically these ping packets that are not being sent. If I were to go to cnn.com the TCP IP packets and specifically through the HTTP protocol would be allowed through. We do have NS lookup as well where you can do the same thing, look up this information and rather than trace RT it's actually trace root one word. So on that's really the only difference between Windows and Mac computers in this case is that it is a full word trace root. However we actually have a really fancy application that comes with our Macs called network utility and it also is in the utility folder, utilities folder of your applications where you can see through these tabs we actually have a graphical version of all of these same things. Ping where we can ping a specific address lookup where we can look up the DNS through DNS we can look up the IP address of these domains trace route there's a variety of other ones who is which gives us even more information about a domain which we will probably cover a little bit later. Some of these aren't generally not a good idea to run for example if you run a port scan on a computer that's directly connected to an ISP so if you're not doing a port scan I don't know it depends on where you're doing it. So let's say you port scan some computer let's say like cnn.com your ISP will generally get mad at you and will probably shut you down so unless you know what you're doing I recommend against port scan however can be a very useful tool and you can actually get an enormous amount of information from this sort of GUI application. In fact your other homework assignment is find an email that Dan has sent you look at the email headers via like the options menu per pass discussions. Some people are writing this down. Find his IP address and then port scan Dan's computer and see what you find. That's funny, yeah I drop all those packets so no worries. Most home routers frankly would block you from these kinds of attacks, these kinds of threats so in fact we can tie this together to last time's discussion why would you care to port scan a computer? Yeah so find open door so to speak remember a port was like a numeric identifier for a service like 80 was web server 25 was a mail server so port scanning means just trying to checking a range of numbers and saying hello, hello, hello, hello and if you find a door so to speak that responds with yes that is there's something listening a server listening on that port that doesn't mean that you can get into the server that doesn't mean you can compromise it but it means there's a potential opportunity there especially if that server has some bug in it that you can exploit which is precisely how some machines get taken over. So we can actually try it on your own computer so we have this idea of an IP address that represents your own computer and it's usually or as always 127.0.0.1 that IP address never means anything but your own computer there's another name for it as well called local host and I can actually do a port scan on the local host which is this laptop right here and we can see what open ports are available so we can see that we have something here called IPP on 631 we're most likely not going to find anything that's too interesting because I like to keep my firewall on and keep services off however if I were to enable a service so let's say I go to sharing and let's say I turn on web sharing which is essentially it's just it's really just a web server I can scan myself again and you will see that now I have an open port 80 HTTP on this computer which may or may not be a security risk but in this case it is there really is no reason for me to be serving web pages off of my laptop I wonder if there's anything else open I kind of doubt it but I hope not why what so my laptop even though I can receive web pages it is almost always a client for those web pages so when I go to for example Firefox and I try to go to cnn.com or rather google.com what I am doing is I am my laptop acts as a client which requests a web page from Google servers so what this is this is the opposite my laptop would then act as a server where I can somebody on the internet could access the web page of my computer so let's say that I have it on right now and what was the IP address that I had I have config oops let's see my IP address is 140.247.228.18 I'm curious to see so if I since I have just enabled a web server on my computer presumably I can type in the IP address of my computer into a web browser and see if I am hosting any web pages if I hit enter indeed I am it says that I have just set up a web page on my computer however this would not be a very good thing for me to do why why would this be bad for me to keep a web server running on this laptop let's say it had my website on it I put a bunch of photos for example not no bad photos just photos yes right so that's what's a very good point every so let's say it's a particularly busy site and it's very popular so now every time someone tries to visit this site it takes cycles on my computer to process sending this web page but there's an even more important distinction I typed in the IP address here what happens when I go home yeah it changes I have a completely different IP address so now even if I had a host name or a domain name that pointed to this IP address as my computer I'm not going to be able to access it because it is not going to update these DNS servers all the time it's just going to the DNS servers for my domain dot com or whatever it is will point to one IP address and if I stupidly set it up to be 140.247.228.18 it only work when I'm here but even more specifically when I come back and I request another IP address from the DHCP server it might give me a different IP address so I wouldn't have the same one so that's that server would still be inaccessible so what's next on Monday tonight or later by tomorrow will be releasing the next problem set which ties together hardware software and the internet but it won't be due for a couple of weeks because we do have the exam coming up next Monday that will be an hour long so after the hour we'll call it a night so that everyone can sort of get home early decompress from the exam catch in fact maybe little heroes alright but we got a lot of fun stuff coming up in the weeks to come so we've just about we just concluded our focus on hardware software on the internet the week after the exam though we'll come back and discuss multimedia various file format streaming audio streaming video compression techniques and stuff that's very much in vogue and on your desktops today spend a couple weeks talking about security so we'll use some of the same building blocks from the past couple of weeks and talk about what can a bad guy actually do with this information and what can you presumably the good guy do to protect yourself against such then finally in then in week nine will dive into website development around this time you'll start thinking about what domain names you're gonna buy for like six ninety five nine ninety nine for the course's final project we're gonna teach you html and a bit of cascading style sheets with which you can make your own websites on the course's shared server we'll talk about programming you get a sense of what software developers do when how they do it will have a second of two movie nights will get a little more popcorn from johnny at kennel square and watch start up dot com a movie that right traces the rise and fall of a uh... famous start-up a failed famous start-up and we'll have our exciting conclusion in the final week so it's good thank you all for coming see you next week