 You know, but I did that before we jumped on. So let's get started. Cool. We have super fun. You all can see that. And people on Zoom, can you confirm that you, you can see the slides? All right, thank you. All right, I missed you all. That's the first thing I'll say, although we interacted a lot over Zoom. So I wasn't too far, I mean, through Discord, so I wasn't too far away. Cool. Now that you've learned a ton, we're going to turn our attention to network security. So this is actually one of my favorite topics that we cover in this course because we really get to kind of peel back the curtain and actually learn how networking works, right? Which is something that is integral to our lives, right? So there's people who are right now listening on Zoom. And so the question in your mind actually as a computer scientist in general, and as a curious person would be, well, how does that work, right? You know that there's computers involved, you know that somehow the data, my voice packets, the video packet here, is coming from my computer right here and then beaming to the 40, you know, 60 some odd people who are on Zoom right now. Right, that has to happen somehow and that data has to go over the internet, traverse networks that I have no idea about. I don't have any control over of what direction it goes. And so there's really interesting concepts here about how that actually happens. So this, in case you are, I don't know, what week is this? It's the week before spring break, let's say. Seventh or something like that? Yeah, so it's like the seventh week. So you shouldn't be surprised that this is not a networking course, but we are gonna cover networking. So we're gonna go at the depth that is necessary to understand the security implications so that you will fully understand when somebody tells you or when you tell somebody else why it's a bad idea to connect to an unsecured Wi-Fi network, you'll actually be able to say exactly why that that is a bad thing to do. And the unsecured we mean something without a password. And we're not just gonna understand this at a theoretical level, we're gonna study the protocols involved at almost all the layers of the networking stack so that we can see what attacks we can do and what attacks an attacker can do at those levels. So there's a lot of really cool stuff in here. This involves kind of learning how networking works so that we can break stuff. Any questions before I get started on either here or on Zoom? Let's jump off. So we first start with the IP. So the, we're gonna talk about this as a suite. As we'll see, there's actually a number of different protocols at all different layers of the stack. And actually one of the things that I want to develop as we're going and as we're studying these things, and this is the things that help you when you're learning things, right? When you're learning some kind of protocol and they say, oh, they have the bits are here and these bits are here. You should ask yourself why, right? Some human had to design this protocol. Why is it the way it is? And so I'll try to point out as we go through some of the things why certain things are the way they are and what the benefits and drawbacks of those approaches are. Sometimes the drawbacks they never thought about because as we'll see a lot of these protocols were done in like the 70s and 80s before we were even thinking about security. So there's a lot of security implications, but there's also a lot of good design decisions that went into these things. And you can apply these to other aspects of your career. So that's why it's really important to think about these. So we're gonna think about the entire, so we'll call the internet protocol, suite, stack, all different types of names. Basically the set of protocols that's used to transport data from between different nodes on a network. So it's actually kind of brings up an interesting question. So yeah, the point of these networking protocols is to transmit data between different basically computing systems. So why am I saying like a super broad computing systems not just like computers? Switches, yeah. It includes routers and switches and your wifi thing at home. I mean, I'm connected to wifi right now. There's probably one, if not multiple in this room because it's a 438 person capacity. So maybe you'd want two routers. I actually don't have no idea how they do that, but there's definitely some wifi router somewhere here that my packets are going to. That's figuring out where to send it to within ASU. And then ASU needs to figure out how to get it to zoom and so on and so forth until that data finally gets to where it needs to go. So because of that, there's a lot of different things that need to happen there, right? There needs to be, we need to have some way to transmit that data, like the actual video frames or whatever, which are encoded themselves in their own different protocols. But beyond that, I need a way like there needs to be, there's a wifi standard of how the wifi devices talk. I actually know very little about like wireless networking. So I don't know much about how that works, and all these different, what do we have to do? Is it like AG or AN, or I don't even know the latest standards, but there's a lot of them to shoot data super quickly across this. Cool, so we're going to look at that. And the other thing, there's also called like the TCP IP stack. We'll kind of see these form kind of the core protocols, but there's a lot of different aspects here. And one of the key design principles that I want you to be thinking about is this notion of abstraction and encapsulation. And that's actually why we're going to be looking at it and why people think about it in terms of these different layers, where you have the essentially, so there's the physical layer of how the data gets from me to the router, which is done with wireless, but then there's how does my machine know to even talk to that wireless router? How does it know, like if I'm just shooting data into the air, how does your computer know that it's not meant for you and this router knows that it's meant for it? That's a different layer. And then the layer on top of that is basically, how do I get data to zoom or to whoever it's ultimately going to? And the key thing here, and this is again, another key security principle is you always want to look for these places where the abstractions break down. As we'll see, it's not this beautiful model that maybe if you take a networking will take, they teach you about, there's actually some bleeding between the layers. And that actually causes security problems, as we'll see. So yeah, we start with kind of the link layer. So there's physical protocols or the lowest layer, then link layer protocols, which we'll talk about internet protocols. So what's the difference between an internet and an intranet? Yeah, so zoom is saying, yeah, so an intranet is like a local network, right? If you own, think about it this way, right? If you're ASU, you own all the routers, all the computers, whatever, you can make whatever kind of crazy networking scheme you want inside. The problem is when you try to talk externally, when you need different organizations to talk to each other, that's why how we actually came about the internet. This is the internet is a connection of intranets, basically. So each local internet does things their own way. Now it becomes a lot easier rather than translating whatever you talk to somebody else into some different protocol. It's a lot easier if you run the same thing that the internet kind of runs on. So we'll look at that. So is Bluetooth considered an intranet? Interesting. Yeah, I mean, I actually don't know all the specifics about how Bluetooth works in terms of networking. I think it's more of a point-to-point communication, like a link layer of how you make a connection. But I think it's classified as a personal area network. A personal area network, I don't know what that means, but maybe it's like a local network just based around those Bluetooth devices. So maybe it has some, I think the key difference there is in an intranet, you can get information in the same protocol to different machines inside one intranet. But I'd say there if it's just like a hub and spoke, like a bunch of things connected to one thing and they can all talk to each other, that's not quite the same as being able to pass data between other systems. Cool. And we'll look at transport protocols. So how do we send data? What's one of the key problems that can exist or that you may have faced when using the internet or sending data on the internet? Is everything always fine? You're just like very happy with your internet connection. Have you ever been on a Zoom call where the person's like, audio dropped or video messed up lag? I can see the gamers in the chat talking about lag. Yeah, why is that a problem? Why does that even happen? We made all this stuff, right? Shouldn't it just work louder, please? There's a lot of that. Yeah, there's a lot. We don't know, we don't control exactly what physical route the data is taking. And therefore we have no idea of what kind of physical things it's gonna happen, right? Like literally data is flowing through the air and the wireless to this wireless router. Somebody could be jamming me in this class. They could be sending a lot of signals to break up my connection there. It could be environmental concerns. Actually I read a, I think it was a paper blog post a while back that was talking about one of the most common causes of internet outages is squirrels because they end up like chewing on some of the lines and then I would assume the squirrels die at some point but I don't know exactly the physics there but eventually like squirrels actually ruin internet or another paper I saw looked at the correlation between the weather and internet outages. And they found, of course, which seems intuitive but when there's a big storm and everything actually there's a lot of internet problems, right? So all of these things, your data is traversing these systems that you know nothing about, you don't control somebody could literally just go and like cut the link that your computers are using to connect. And so how do we get this data out there? How do we deal with this kind of uncertain conditions? So these are the super interesting things there. Some of these claiming that that's just service providers pointing fingers. I've heard this from other non ISP sources. And then we have at the highest level application protocol. So what are some of the common applications that you use on the internet? Google, that's too high level. That's something that runs on the internet. What does Google use? How do you talk to Google? Say it again? Your web browser, what's your web browser use? Yeah, that's what specific browser. What's the protocol that it's actually using to talk to Google.com? Yeah, HTTPS or HTTP, right? So those are and the web. So the web is actually, so this is one of the key things that if you learn this distinction will make you seem much more informed about things so a lot of people conflate the web and the internet. So what's the difference between the two? Close, close. Yeah, so domain names factor into it, although not quite, they're not exclusive to the web. What, let's think about it this way. What are some non web application protocols that you maybe use on the internet or they've heard about Spotify? Spotify probably uses the web, although I don't know that for a certain. SMTP, anybody ever send an email? Yeah, so symbol mail transport protocol SMTP is the way that your mail goes from computer to computer that's absolutely nothing to do with the web and it actually predates the web. The web was actually, I think we'll go into it, but the web was actually created in 91 at CERN and the internet was around like in the 70s and 80s. So they had emailed way before they ever had the web. But was the wait what on the age of the web? Yeah, the web, it was only created in the early 90s. So it started at CERN. So what does CERN do? C-E-R-N. There's nobody here who's ever taken a physics class for physics, interested in physics. CERN. Yeah. Yeah, it's a particle physics collider or something. I actually have no idea what the acronym stands for, right? And so basically they're trying to slam particles together to see what happens, I guess. Like, I don't know, I'm not a physicist, but I know at least at that level. And so what they have to do is they had a bunch of people who would continually rotate in and out of the company. And so one of the engineers that are one of the, if he wasn't an engineer, don't remember his exact job there, but he was working at CERN, Tim Burner's lead. And he realized like, oh, it'd be great if we had a way to like list people and offices and phone numbers. And he had been keeping track of some things like hypertext where you could like, each document could have links to other documents and you could follow those documents and you could edit things. And so he literally invented HTTP, HTML, and like URLs, URIs, so that they could do that there in CERN. And it actually took off like wildfire and became literally the web that we know today is based all on that stuff. So it was insane. Like I think he created the first ones in like 91, 92 and like by 94, 95, we had like major web browsers and we had like more of the early web. So anyways, cool. Yeah. So yeah, anyways, lots of different applications. So that's one of the key things to realize and we'll talk about some of them, right? You've actually used them probably some of you when using the pwn.cse365.io. How do some of you remotely access that system? With what? Yeah, with SSH. SSH is a complete, has no, pre-dates again, the web. It's a protocol that has nothing to do with the web. It's how you access a remote system. And so it uses a different protocol than websites used to transmit data. So let's take a look at the stack. So at the bottom, we have the physical layer. So this is what's used physically to transmit data. If you're on wireless, this would be the Wi-Fi, whatever, whatever. If you're on ethernet, it would be whatever ethernet uses to send data. There's actually, you know, you can make an entire career out of optimizing this layer, right? You gotta think about, where do we get all these new advances in Wi-Fi? What's from people who study these things, make the things more efficient, make them, you also have to deal with multiple people talking at one time. And anyways, there's a lot of things there. Then above that, we have the link layer. So this layer is what sits on top of the physical layer. And it's what says, okay, I'd like to send this information to this specific machine on my local network. And we'll get into really in-depth what the difference is between local and remote here and exactly what they means. How many layers are there? I don't wanna do three, four, five. It depends on exactly which model you're using. I've heard some people use seven, some people use five, we're gonna use five here. So then above the link layer, you have the internet level. So the link is basically, how do I get data to a system that's local to me that I have a direct communication with or direct for some notion of direct? Then above that is IP. So the internet level says, how do I get that data to zoom, right? My ultimate location, zoom does not exist in my local network. So how do I get it to someplace far away from me? Then above that, we have the transport layer. So these, the two classic examples of here are TCP and UDP, two different protocols. It's really important to understand these and how they work. And finally, we have the application level at the top. So HTTP again is just one of the types of applications that run on and use the TCP IP stack. SMTP we talked about, the simple mail transport protocol, DNS, the domain name service I think, this is what is used to translate Google.com into an actual internet protocol address so that you can talk to it. If you didn't have that, we'd have to memorize like 32 bit numbers to access websites, which would be insanely difficult. But the really cool thing is it's not something special, it actually uses this whole mechanism. So use a one protocol, DMS in order to resolve and translate a domain name into something, an IP address that the machines can use. And then they use that to then make a connection. NFS, anybody ever mess with NFS? Or know what it is or stands for? What was that? Network file system. Yeah, so this is when you wanna like share files between machines, if you run like a home NAS or something like that, you can expose it as a network file share. If you run Linux and use Samba, Samba and SMB are like, I think they're related to NFS. I actually don't know what the precise differences are. Questions about this level? Yeah, SMB is what Windows uses. I think it's based on NFS or something like that. It's very similar, but anyways, no questions, everything's crystal clear, louder, the physical layer or higher up. I think this depends a lot. So we'll see basically the way I think about it is this link layer provides a, as we'll see, provides a standard interface to send data from one machine to another on the same local network. So how it does that with the physical layer is up to whatever this physical layer is. There's all kinds of crazy stuff. There's, I think one of the older ones was a token ring network. So the idea was switches were super expensive and difficult to do. So what you do is connect each machine in your network, one to the other. So each machine would essentially have one, I don't know if it was ethernet or coaxial or whatever, but one physical in and one physical out to the next machine. And when you want to send a packet, you would just send it out on your out. It would go to the next machine and then keep going around. Everyone takes it, looks, is this for me? Nope, passes it off to the next person in the ring. And so you can see that's like insanely inefficient way of doing something. But when you have limitations based on, you don't really have switches and those kinds of things, it actually can work. So this is why the link layer works exactly the same whatever kind of crazy scheme you have at the physical layer. And so we'll see it's actually as a standard way of saying, oh, is this packet for me or not? And then we'll look at, so the, already in this diagram, we actually have some bleeding between the two and we'll see this. So there's, we'll get into our, so the address resolution protocol, this translates between IP and the link layer. As we'll see, there's different addresses at both. And you may know, oh, I wanna talk to this IP address, but you don't know the link address on your local network. And so we'll look at that. Our ARP is the reverse, like I have this link address, I wanna translate it to an IP address. Those are protocols for each. Let's see, other things here, ICMP. Anyone ever use the ping command? The test if something's up? Yeah. So I'll show that off later, but this is basically, this is using a specific ICMP message that says, hey, please send this back to me. That's actually built in for network debugging. The reason why you learned how to use the console, yeah. You mean besides this class. Cool. So let's dig in. So we first have to start with addresses. Why? So the goal here is to be able to transmit data from any one machine to any other machine, but there's one piece, super important piece of information we need to know. The other thing I'd like to think about is the post office. Anybody actually physically write a letter? Anybody actually get mail or you all, I don't know, a shoe, digital mail? What do you have to do to actually send a letter to somebody in the United States? You put a stamp on it so you pay for it? Well, you write the address. Why do you need to do that? I don't need to know where it's going. Who needs to know where it's going? The post office needs to know where it's going, right? They need to know, how do I physically get this letter from us here in Tempe, at the Tempe post office and how do I get it wherever it's going? St. Louis, wherever, right? Do you even need necessarily a name? What's actually required? So if you don't put postage, they're not gonna mail it, right? Do you need a name? Are they gonna verify the name? Not necessarily. You could kind of put whatever you want. I don't think anybody's gonna check. Yeah, resident. Do you need an address? Yeah, it seems silly, right? But you think about what's necessary. If you don't have an address, they literally don't know where to deliver that mail. Actually a fun fact about the post office, I don't know if it's true anymore, but the legend was that especially in small towns, the post office person would know everyone in the town. So rather than specifying the exact address, if you didn't know the address, you could just put identifying details like the old man Ford's ranch or something and it would get there. Or you could say like the red house down by the big oak tree by the river and they would like know who you're talking about and actually get the letter to you. What about the people are mentioning return address? Do you need a return address? Or put another way, what's the point of a return address? One of you. First to raise their hand. Okay, sorry. Yeah, two reasons, right? One is if the post office has trouble delivering the letter, they can send it back to you. Why is that useful for you? So you know that it failed. You know that they didn't actually get your letter and you can try again. Maybe they'll say, hey, your description wasn't good or there's nobody at this address or the person moved, whatever. What's the second reason why you have a return address? So one's the post office. You wanna go to the other one? Yeah. If somebody wants to see like send a reply. Not just somebody who specifically. Yes, the person I send the piece of mail to needs to know how to send it back to me, right? And that's why I return it. So if you don't care about a response, then who cares if you have a correct return address, right? Oh, sorry. All right. So we need something like this on the internet, right? We actually have a lot of the properties that we want, right? We need an address because we need to know where to send data. It's exactly the same principle. We also need to know our address so that we know how they should send data back to us. And we actually have very similar mechanisms. If something happens along the way, there's actually mechanisms for the internet or more specifically one of the switches along the way. We'll send us back a note saying, hey, you're packing out lost, sorry. It may also not because what if that machine, you always have to think of the scenario. What if my information, my packet was inside of a switch that was just on plug, right? Good luck trying to get a feedback on that. So okay, we need, so everyone agree we need some sort of address. What's the problem with human addresses, with the addresses we use in the post office? Yeah, they don't, who doesn't understand? Yeah, computers don't really understand physical addresses. They're not uniform. There can be crazy differences between the Fifth Avenue and Fifth Street. And if you don't maybe specify them, there can be ambiguities. So specifying exactly which places, which is, could be difficult. What else? Yeah, lots of repetition in physical addresses, right? I don't really care. Does the post office care that you live on this specific street, right? They just care that they can deliver that data to you specifically. Cool. Yeah, distances between house numbers vary in a weird way. I actually have no idea how that works. It's probably fascinating. Cool. So in order to do that, we need an address. So we'll get into the specifics of what that is and how it works. For now, we just know that, hey, if we want to be on the internet, we need an IP address. If we don't have an IP address, we can't send data and we can't receive data. It just doesn't like, it'd be like trying to, I don't know, send data from a place that the post office has no idea about and it just, it's not going to work. You're never going to get your data. And so here, a network interface you could think of as, you could have many, you could have a machine with multiple ethernet cords. You could have a machine with two or three or four wifi adapters, all kinds of craziness. So each host, so if you want to be a machine on the internet, you need an IP address and you can actually have one or more, each network interface can have one or more. And okay, we're going to be studying IPv4 because I think it's simpler. So IPv4 addresses have 32 bits. How many IPv4 addresses do we have? Possible, like theoretically possible. Two to the 32, how many is that? Yeah, roughly four billion. Is that enough? Seems really big. It seems like a large number. Do you like two to the 32 dollars? Yeah, so the problem is, we already have more than four billion people in the world. Of course, maybe not, you could say not every person needs an IP address, but I think devices might have more than one. Yes, devices can have more than one IP address and people can have more than one IP address, right? Think about on you right now, you maybe have a laptop as at least one IP address. You probably have a cell phone, which has an IP address. You probably have a, you may have a tablet, which has an IP address, right? All these things have and need IP addresses. And so when you start multiplying these out, you get kind of crazy. And actually this is actually a big problem right now is we've actually run out of IPv4 addresses. There's no, there's very little available IPv4 address space. So it can cost a lot of money to get that. But luckily we've kind of sort of moved to IPv6 and there's a lot of reasons why you don't need to worry about it too much. As a consumer and we'll actually see exactly why. And so yeah, IPv6 addresses use not 64 bits, but 128 bits for addresses. So how many is that? Like a lot, a lot. Like it's hard to overstate how much that is. Something like more IP addresses than every grain of sand or something like that. Yeah, it's, it is a lot. Something like every atom could have its own IP address or something IPv6 address. I don't know. You can look it up and get all these kinds of stuff. So. And so, okay, but we will ignore that for now. We'll focus on 32 bits. And the way that we represent IP addresses. So if you just had to think about IP addresses as a 32 bit number and say, oh yeah, I'm 10,052 or I'm a 1,560,064, that would be kind of difficult and annoying. And so the more human readable way that we can do this is through a dot and decimal notation. So this is basically taking those 32 bits of the IP address and splitting them up into eight bit chunks. So in the bytes. So the uppermost byte is the first octet. So everything before the first doc can be zero to 255 and decimal, the second one zero to 255, third one zero to 255, fourth one zero to 255. So much easier way. I'm sure you've actually, has anyone not seen an IP address like this? What's maybe the most famous one that comes to mind? Yeah, 127001, which is in a special range. That means your local system. And actually a fun fact. So yeah, 127001 is the local host. Many systems, well, many systems may, so you can, there's existence of vulnerability in some websites and stuff where they go to fetch a website, but rather than making a web request to an external thing, you trick them into fetching from their local system, which maybe has admin functionality and all this other stuff. So a lot of places will block an IP address that looks like 127001. Fun thing is a lot of things will actually take the raw 32 bit number in decimal. So you can translate that to decimal and just request that and it will actually work. It's kind of crazy, but anyways, this kind of notation representation thing is interesting. But the important thing for you is that these differences don't, right? This is just a way to represent, it's like the difference between hex and decimal. Does it mean that the number we're talking about is different? What's the only thing that's different? Yeah, how we're choosing to present and discuss those values. Cool. I'm going to place this in the same dish. What was that? It's like different ways of saying the same dish. Yeah. So if I gave you an IP address and this dot address notation, you'd be able to tell me like, yes, that's a valid address or no, that's an invalid address. And what would you use to figure that out? Yeah, so you check each octet and you'd see, is it between zero and 255? If you see 256 or higher, you know this is a bad IP address, it's not valid. Cool. All right. Now what we're going to get into here is we need to understand how data moves. We need to be able to understand and answer this question. Is this IP address in our local network or not? And so there actually used to be an early way of splitting up IP addresses into local and remote to say, okay, everything that has the first three octets is in my local network or everything that has the same first two octets or the first one octet is in my same network. It turns out that's a highly inefficient way of distributing IP address space because your granularity is not very great. So there's a system called CIDR, Classless Interdomain Routing that does this. So, blah, blah, blah. Cool. So what we're going to do is specify this boundary. So to say what's a local network versus a, versus not, we can use CIDR to specify this. So, let me explain. And you can't see any of this. Cool. Okay. This is annoying. Mirror display. It's old optional dragging Jesus because that'd be more confusing. Hello. All right. Okay. So what we needed to decide is what's a local network and we're going to use basically CIDR notation for this. So the format here is, you can think of as IP address and then a slash. Then some notion that says where the split is. So I think this is the best done by example. Can you read this in the back? Or are you going to make it bigger? Yes, I can. I hope so. I'm going to keep hitting buttons until something happens. Okay. So 192.168.0.1 last 24. So we know this is an IP address 192.168.0.1, right? Now the slash 24 says on the 24th bit, everything, I guess 24 bit. Is that right? The higher 24 bits are part of the local network and the lower 24, the lower 32-24 be eight. The lower eight bits are the hosts in that network. So what this means is any IP address that starts with is in the local network, right? So slash 24 means, put a slash on the 24th at the 24th bit, right? And I picked 24 because it's now is very easy because all of this octet, all of this octet, all of this octet, these are all the uppermost bits. And so we say, okay, to this specific machine. So the other important thing is this is a machine configuration. So to be part of the internet, you need to not just know your IP address, you need to know where the split is and you need to know how to determine if things are on your local network or an external network. And there's a really important reason for that because if it's on our local network, we know we can just talk to it directly and we'll see that we can use a link layer communications to do that. If it's on our external network, we need to do other things to make sure it hops. So the other way to think about it is the link layer determines that first hop. How do we get something first to one hop and then everything else determines how it goes further along. Cool. So if I say, okay, this is my machine's configuration. This is my IP address. This is my network configuration. I could ask you a question like, is 192.168.1.1 local? How would you figure this out? What's the algorithm for this? This is still our thing. So we're gonna stick with this for right now, for example. Yeah, compare the higher 24 of ours, which is 192.168.0. With the higher 24 of here, do these bits match? No, it's not, right? And this is again, it's all about this example, right? If I now said our network split is 192.168.0.1 slash 16, now I could say is 192.168.1.1 local? Yeah, why yes? What's the algorithm, right? This is, we're about developing how we do this, right? So I know the split is 16. I look at the top 16 bits here. And I say, yes, these are identical. These are exactly the same. So I'd say, yes, this is a local address, right? What if 192, now, now let's say it's 192.168.0.1 slash 17. And now we'd ask is 192.168.1.1 local? Why is this more difficult? Yeah, it's no longer byte chunk. So we can't just do it, right? Yeah, it's no longer byte chunk. So we can't just do it just by looking at the octets here, right? I mean, you kind of know, because you could do it in your head. You could say, it doesn't matter that it's not even split. The whole point of CIDR is you can put the split at any bit from all the way, whatever this says, all the way from 13 to 27. So you could put it kind of anywhere within there. So yeah, so what you could do, right? You could just do this algorithm. There's a couple of ways you could split it into hex, right? You could translate each of those into hex. Actually, let's, and we could do that. So I can't just translate things into hex in my head. I'm sorry if I'm letting you down, but we can look at our example, which is 0xC0 and then 168A8 and 0 is 0.0 and then 168A8 and 0 is 0.0 and 1 is 0.1. And we could try this, C0A8 0.1, 0.1. Now we can see this definitely makes sense because we know the splits somewhere in here, but we know that this oct, this byte, like for this nibble, I guess, it's half a byte is the same. These four bits are exactly the same. And we could even go further. Do I have it here? I guess I do. Yeah, I guess like C0A8 0.0, 0.1. So we could, do we wanna do it? Yeah, let's do it. This is always fun. All of here assumes, wait, why is it D0? Okay, C0A8 0.0, I don't want, right? And if we were a computer, we could actually translate this down to binary and 0.0, 1.1, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.2, 3.4, 1.2, 3.4, 1.2, 3.1. And the other one is, so we could just look at the top 17 bits if we were a computer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 14, 15, 16, 17, right? Luckily computers are way faster than I am at this in case you were wondering. So I could even easier, I'm gonna put it next to it, right? Whoa, that's all right. You guys are supposed to check me on this. Let's go delete that, right? And so you could say, yep, the first 17, upper 17 bits are the exactly the same, right? Oh, I'm using the calculator just in case anyone wasn't clear. I'm using these bits right here. I did not do that by hand. Oh, I clicked on it. Oh, that's cool. Ah, look at that. I just learned something. You can click on the individual bits to change something. Oh, did you guys think I was just translating that in the binary just by looking at the hex? I should have just let you believe that, that would be great. All right, no, I'm not a computer. So we can actually see the difference, we could go out all the way. So we're at 17, 18, 19, 20, 21, 22. It's only at the 23rd bit that it makes, that it changes, right? And this makes sense, but the only difference there is a zero and a one. And let's see, the other thing, 128. Yeah, so we can also ask, so this we said, yes, this is local and where this would change is 192.168.128.1 local. And the answer there should be no, because 128 has this highest bit set, right? It's 80, so it has the highest bit set of there. So that would change this bit from zero to one. I mean, I can show this, it will be exactly like this, except this will be one. And we'll say, no, this is not local because these are different. Cool. So everyone agree that if you can calculate these to determine is local versus is external addresses, by not saying anything, I agree that you can do this on tests. Okay, cool. Let me guess, you'll have to do this for a challenge. Yeah, you're demonstrating that you actually understand this stuff instead of just sitting here in class nodding. That's why we have assignments and exams and stuff. I know it all seems arbitrary, but sometimes it has a purpose. Okay, cool. So now that we can answer that really important question, right? We can answer this question, is it local? Oh, and this is not theoretical, right? So I just kind of, this isn't just theoretical. Yeah, so you can actually look at this on your network. Now it may be represented differently. So some systems you specify it in that cider-like notation where you actually specify the IP address and then a slash with the net host boundary. Other systems like here right now on this network, oh, you cannot see. Okay, cool. I guess I'll use my fun option trick. Jaw back. Okay. So now that you can see it, and let me make this one finger. So now that you can see this, we can see that my, on this current system that I'm on, I'm on 10, 153, 17, 139. And my net mask here is, so the net mask is the other way. Doing this. So ff, ffc, 0, 0, 0. So a mask, so f, so this would be, let's see the top, definitely the top 16, right? So this ff, ff is all ones of 16 bits. And then C must be something similar. Let's see, C is the top two bits. So this is a 16, 17, 18. So what I'm on right now is a slash 18 in our terminology. Everybody see how we went from this net mask to there? Okay, I'll show you one other system so we can see, I don't remember the other one. Yeah, so here, this is again a completely different way of showing you the net mask, but it's exactly the same philosophy. These are gonna be all bits. So 192, 168, 1.150, 255, 255, 255, 0. Right, so my net mask, this would be a slash 24, because in the net mask, all the bits that are one are where your boundary is. And so you say, okay, this is a slash 24. So this is the information that as a machine on the network, you need to know. You need to know what's my IP address and what's that network host boundary so I can tell what's in, what's on our local network versus external. And this is exactly how we would do that. So this is why that question is different depending on your network configuration, right? And that's why we went through this example and we looked that, hey, it depends based on your network configuration. Okay, cool. Okay, so the IP protocol is kind of one of the most important things that represents essentially the glue of the internet. Yeah, okay, cool. So it's kind of the layer, right? It makes sense. We have IP addresses. The thing is called basically the TCP IP stack, right? So we know if there's two hosts on the network, we know that each host has its IP address and it has its network host boundary. So it knows how to say what's internal to its network or what's external. Now, the IP protocol provides a connectionless, unreliable best effort datagram delivery service where delivery integrity ordering non-duplication and bandwidth is not guaranteed. Whoo, sounds kind of crappy. Let's take it step by step. What does connectionless mean? Anybody know how the old phone systems worked? Have you ever seen, yeah, have you ever seen those pictures of people physically putting wires into systems to route calls? Do you know what they're actually doing? They're establishing a direct connection between one phone and another. So that way you can call the other phone and it rings and there's dedicated bandwidth in the phone network for that call because it's going over physical wires that nobody else is using. And so that's like a connection networking service. We have two systems and because that's what the operators are doing, right? They're plugging in and saying, oh, this person's phone needs to talk to this person's phone. And that's literally physically what they're connecting. And that's how you literally call into the operator would then route your call using that physical methods to wherever it was actually going. So that's a connection-based service where you're building this physical connection between the two. We don't have that. There's no guarantee. There's no notion of a connection at the IP level. We'll do our best as we'll see to try to get data there, but you're not building a dedicated connection to another system. Unreliable makes sense. We talked about it a lot, right? Machines get unplugged, squirrels eat our cables. The IP level does not guarantee that your data will ever get there. A best effort is also kind of another way of saying the same thing. Like, hey, we're gonna try to get it there, but if things happen, tough luck. A datagram, what does a datagram mean here? Yeah, it just means there's several different ways you can think about it, like packet or data. What this means is, and the connection list also means, I'm not just sending a stream of data over IP with no end. I'm sending in an IP as we'll see in an IP packet. I'm sending a fixed amount of data from one system to another. Okay, this seems really crappy. Why do we want this? So let's go back here. Shouldn't we just start over, rebuild the whole thing, rip out this crap in the middle? That's not giving us any guarantees. It's like, why do we build our whole society on this stuff? Seems kind of crazy, right? Clearly there wasn't a better alternative. I disagree, we'll see why. They could have provided reliability here. They chose not to, yes, but why? Yeah. Because it abstracts away the flow we're going to process. Yeah, okay, another way of thinking about it is, so this is a way of thinking about, you're building these stacks and these layers, right? Anything that you build into a lower layer, like reliability, right? If the IP level guaranteed, hey, if you give me a packet to send to another host on the internet, I will either guarantee you I will deliver that packet or I will guarantee you that I'll tell you that I couldn't. It could definitely do that. Everything above it has to pay back, not just pay the cost, but it has to support that or it's built in, right? Does every single application that you would want to run on the internet desperately need every packet to arrive? When are cases that you've used in your life where it's okay if a packet gets dropped? Right now, everyone on Zoom, would you even know if one frame of video is dropped from this call? And what if that one stream of video was dropped and now you had to get delayed seeing the rest of the video until it finally reappeared maybe 10 seconds, 20 seconds later? How do you know how long to wait? These are all questions. And so not every Skype and voice is another thing. If you've even, Zoom is pretty good about this if you pay attention. When their connection is bad, they'll kind of like drop and then they'll like get back on and they'll like compress everything that they were saying into something like really fast. So kind of catch up. But also random audio packets can be dropped and everything's fine. So this is kind of a key design principle and this is what I want to call your attention to when building these kinds of architectures. It seems insane when you realize like the core of the internet does not provide these like reliability or any kind of things that we would want. And that's because as we'll see, higher levels do provide that. Actually technically TCP does and we'll see how that's done. But not everything needs that. So other things are games, right? So games, you have to synchronize state when you're playing multiplayer games usually between the server and the clients. But if one packet is dropped, you don't want that to impact all the other update packets. You just update that and drop it fine and then you keep going because it'll be much faster than if you did it without. Cool. Okay. And the core thing, the core concept here is we can exchange IP datagrams between any two nodes on the internet as long as they both have an IP address. So again, just like we talked about in the post office as long as you have an address, you can get data there. Okay. The other thing I want to not freak out about is looking at packets and information. Okay. Well, protocols. So you should see this. Good, good, good. What is it, 7791? So if you want to find out how this stuff works, it's not hidden in secret, right? Actually, and this is one of the core things about things like networking protocols. Networking protocols have to be made so that other people can implement them. If it's not, then what are we doing? Like why, you know, if I have a networking thing but I need you to build something to work with me, it's never gonna work. So you can literally just go read the specification. So RFC is the Request for Comments, which is one way that standards get discussed and adopted. And you could go read this. You could, like literally this, what I'm gonna be presenting in the slides is just this information that exists in a text document. And yes, it is difficult to, you know, like it's a skill that you develop of reading these things, but you can figure out exactly how almost any protocol that you want works just by reading the documentation and the comments and the docs and the protocols. But I have obviously a nicer picture, so we'll start with there. Okay, so the very first thing in an IP datagram is four bits. So the way to read this is zero to 31. So these are, we're thinking of these in 32-bit shot, 32 bits, but of course it's just a stream of data, right? It's just one big long stream of data. We're just splitting it up visually. So the version is the first four bits. Why did they make the version be the first four bits? So yes, not close. You know how to interpret the next bits, right? You know, so that this way better, and this is a great, great example for any protocol, right? The very first thing that you should read from that protocol is what version are you? This can make upgrades compatible. When you upgrade to IP version six, all I have to do is read those first four bits, and now I know how to parse the rest of the packet and understand now it's an IPv6 packet. Whereas if I had other information like, for instance, if IP addresses came before this version, I would have to interpret them normally, right? I can't upgrade anything that happens before that version packet. HL, I believe is header length. There's actually a lot of things or some things in here that aren't really used. So I won't go into super in depth here. The total length is 16 bits that I believe represents how many bytes are in the totality of this IP packet. And ID and identifier, I think we'll get into some of this stuff. Flags, fragment offset, time to live is actually a really important thing. So think about this. We're sending data into the network. What's to prevent like one machine says, okay, give some machine, let's say switch A says, okay, I get this packet, I send it to switch B. Switch B says, oh, I get this packet, I send it to switch C. Switch C gets this packet and says, oh, great, I send it to switch A. What happens to that packet? Loops infinitely. It just literally stays around in the network forever until either one of the packets, one of the switches happens to drop it because it's full and has memory constraints or literally just loops forever. So to prevent that from happening, there's a time to live packet which every hop along the way is decremented. And once it reaches zero, it's dropped and people, the switches don't carry it on anymore. This actually has really interesting debugging capabilities. I think we'll look at it later. If you've ever used something like trace route, this is actually how it works and how it figures out the hops and everything. A protocol is actually interesting because it has information about the lower level or the upper level protocols, a checksum. So a checksum will look at the header and make sure that it matches some value. I believe it must be like CRC16, I think. Then we have, of course, the source IP address, the destination IP address, some options and padding so that it's at a fixed offset. And then the data below. So what's gonna be in this data? Or does, I guess, the other way to phrase that, does the IP level care what's in that data? No. No, and that's the point. As we'll see, it could be a TCP packet. It could be an IP packet. Like, or sorry, it can't be an IP packet. Probably can't. Yeah, it could be TCP, UDP, it could be something brand new that we've built on top of it, but we don't care because all our job is to do is to get this data from one machine to the other. Let's see, is there anything else? Oh, yeah, so the header checksum is interesting because you can actually have problems in transmission where bits get flipped, right? Because again, we know this is traveling over some physical mediums, and so there's actually a number of different ways. Now, the difference between a checksum and a hash, right? Can a hash be used to detect if bits have been flipped, hash this header and everything else except for this? If I hashed it and then put the hash value in here, could you check and verify that no bits were flipped? That's great. Why not? Sure, but if you flip one bit is the hash gonna be different? Yeah. And then you can check that and detect. And you can have multiple storing points that would be the same. The odd defining that though, depend on the size of your hash, right? So with a 16 bit hash, you could brute force all the two to the 16, but I'm not talking about a attacker doing this. I'm talking about a random bit flip in transmission, right? So you could do that. I mean, also the attacker would control this hash so they could just re-compute the hash and put it there, right? If they wanna change something. Anyways, the reason why you don't do that is because it's more expensive to do that. So the cryptographic operation for a checksum or they're not cryptographic operations, the math operations for a checksum are much easier and more efficient. So that's actually why we don't have hashing in here at this level. Yeah, the goal is to catch like random bit flips, not anything crazy. Okay, cool. So as we'll see the, so we're gonna have kind of, so we can think of the IP packet as we have this header that we just looked at with the format and everything. We have the data and then as we'll see, this will go inside an ethernet frame, which is the link layer. We'll look at that next. So the way to think about that is from the ethernet's perspective, it's gonna have its own header and then it's gonna have this data and that data happens to be an IP packet. So this is how you kind of get, when you're thinking about data moving along a network, you have this kind of like onion layers of different headers. Usually first half the, if it's an ethernet packet, you'll have the ethernet header frame and then in that frame data you'll have the IP header and then the IP data, you'll have the TCP header and then there you'll have the application data which may itself have headers, right? But the point is each layer only has to worry about its stuff like, okay, cool. So what we're gonna look at now, so we're gonna look at is IP direct delivery. So direct delivery means the two hosts are on the same network. How do we know that they're on the same network? All right, same the same network. I mean, they're on the same local network. What was that? Yeah, we can look at the IP address and the nut mask, right? Or the cider, however we wanna do that. But just exactly what we did, right? The computer does precisely identical to what we did just very fast, much faster than we can do it. And it says, are we on the same local network? If yes, then I can use IP direct delivery. So on this example, so sub network is the same as the net mask. So saying like, okay, the top, this would be a slash 24. So 111, 1020 is my network. I have two machines, one as 111, 1020, 121. And the other thing we're gonna look at in a second is I need actually a different type of address of a physical address. And so that is what we'll look at is MAC addresses. And so basically we're gonna study and look at how does a packet, if I wanna send it from 111, 1020, 121, I wanna send it to 111, 1020, 14. Hey, I know by checking my network information that yes, this is a local address meaning I can do IP direct delivery. And then we'll show how that data actually moves on the local network, but to do that, we need to first look at the ethernet frame. So this data is in bytes. So at the physical, well, no, no, sorry. At the link level, ethernet addresses are six bytes and it typically represented in hexadecimal as we've seen. So the destination, source, the type, and then data is actually a very simple protocol. And then actually a checksum also itself at the end, that's why I said there's multiple checksums and multiple layers. The different types are for ARP or reverse ARP, but we'll look at those in a second. And so good, we got plenty of time. All right, so ethernet is one of the most widely used link layer protocols. This actually does not mean like necessarily physical like ethernet tables. I actually don't know why the names are interlinked because your Wi-Fi has like ethernet frames built on top of it. So addresses in at the link layer are 48 bits and they're typically represented in hexadecimal separated by colons. So here 0945FA072223, these are, this is a physical address. There's actually some kind of a schema that you can go look up that different network card manufacturers will have different prefixes. So you can actually go look up what those are. Now this is kind of an important thing. Do we need to worry about global collisions? Like how can we have a run out of ethernet addresses? Too many devices. I said, have we? So is your claim that there are too many devices and we have run out of ethernet addresses? Yeah. It's not true. Why not? We can recycle addresses and why? How come my ethernet address here can or can't conflict with maybe some place in Google happens to have the same one? It's local, thank you. Who is that? Yeah, it's a local address. It's only significant inside my local network. I will fundamentally never know Google's ethernet addresses unless I'm on their local network and Google has a big problem at that point. So this is actually why, and again, this is part of what I'm trying to get you to think about. This is why IP addresses, we need much more because IP addresses have to be public and you have to know somebody's IP address in order to communicate. Whereas I don't know and I won't ever know anyone else's IP address that's not on my local, or sorry, I won't know anybody's MAC address that's not on my local network, yeah. My question is with the IP address system, why don't they just set up that way if network happens today and do a separate index for like some... Say that again, so the... Like the normal 32-bit IP address, it's never missing a device on the internet. They're representing a network connected to the internet and then attached to that 32-bit IP address would be like indexed or whatever, whatever it goes by. Yes, okay, that's a good question. It was kind of built that way, so from the start. So that's the 32-bit IP address was separated into different classes where the first octet was the network and the rest indexed a specific device. And the other type of classes where the first two octets were the network and the last two octets were the device and then the last, the first four and then the last four. But that's not a big enough granularity was the problem. So that actually is the core problem because you do need like part of your IP address specifies your network and part of it specifies your machine inside that network. So you actually can think of it that way. The problem is they only use 32-bits for both of those rather than using, like you said, 32-bits for the size of the network maybe and then 32-bits for the host or something. But then the problem is then every network would be capable of having two to the 32 hosts, which you don't really need that in a lot of cases. So anyways, there's a lot of all kinds of trade-offs. And that's why IPv6 functioned up to 100 to be exactly. So you can have, I think it's the same principle where you can put that split wherever you want. So you can have as large of an internal network as you want with many devices. Cool. Okay. Interesting thing that we'll kind of note here and put it in our brain for later is that the maximum size of the ethernet frame is 1500 bytes. Okay. The first thing that we need to know, actually, I hate stopping early. I want to give you your money's worth. So remember what we're trying to do here is we're trying to send an IP packet from one machine to the other. And the only thing I know about the destination machine is the IP address. But if it's on my local network, I need to send it physically on ethernet and I need to know their MAC address, right? Because otherwise I can't send them at the link layer to where it needs to go. So we need to have some kind of protocol to translate between the two, and that is ARP. So ARP is the protocol that translates between IP addresses and ethernet addresses. So again, check yourself. Can I make, does it make sense to make an ARP request to like Google.com to get their MAC address? Why? Sure, you need to be. No. What was it? They're not local. I have no use for their MAC address, right? So ARP is by the principles we just talked about only available on your local network and we'll see kind of how exactly that works. It does have nice security principles, but that's not the fundamental reason why. Cool, all right, there we are. 10-15, we'll come back and Thursday we'll get digging into ARP.