 Now, let's get into the nuts and bolts of the Internet Protocol and its associated protocols. The way these pieces all work together is often described in this four-layer model. At the bottom is the so-called link layer, which refers to the workings of each individual network. Ethernet, for example, is a very commonly used standard at the link layer. In an Ethernet network, each device on the network is known by a unique MAC address – MAC Standard for Media Access Control – and each device on the network can send Ethernet packets to any other. Aside from the data payload itself, each packet contains a header and folder which describe the packet, such as the source and destination MAC addresses. The limitation of Ethernet and other link layers is that they handle sending packets within just the local network, not amongst networks. That's where the Internet Protocol of the Internet Layer comes in. IP packets travel within each network via the link layer, such as an Ethernet packets, but IP packets also get passed from network to network. Aside from the data payload itself, each IP packet contains a header which describes the packet, such as the source and destination IP addresses. The transport layer provides two main choices of protocol – UDP and TCP. Both of these protocols include source and destination port numbers, which effectively specify which program on the sending host sent the packet and which program on the receiving host is intended to receive the packet. The difference between UDP and TCP is that TCP provides reliability, while UDP does not. By itself, IP is unreliable, meaning that IP packets might get lost in transit. For many applications, this unreliability is of course unacceptable, so TCP keeps track of which chunks of data actually reach their destination, and then automatically resends any chunks of data which didn't make it. UDP provides no such mechanism. Finally, the application layer refers to the application-specific protocols used by various programs. Web browsers and web servers, for example, communicate using HTTP, the hypertext transfer protocol. Meanwhile, email clients and email servers communicate using SMTP, the simple mail transfer protocol. Application protocols are very numerous, so we'll only cover HTTP here. So that's the complete picture of the four layers. The link layer packets carry IP packets, which themselves carry UDP or TCP packets, which in turn contain the actual data which programs mean to exchange. This data is itself expressed in terms of some application protocol appropriate to the application, such as HTTP or SMTP. Now let's look at IP, UDP, and TCP in detail, starting with IP. Looking at the header of an IP packet, we have first the version of internet protocol specified as 4 bits, and then the length of the header in bytes specified as 4 bits. This length value is multiplied by 4, and an IP header is at least 20 bytes, so the header length value will always be 5 or greater. The next two fields, differentiated services, code point, and the explicit congestion notification, will ignore these because they're actually optional. Some hosts and routers use these fields to help regulate the pace at which packets get sent, but many hosts and routers simply ignore these fields. The total packet length field, however, is not optional and specifies the size and bytes of the whole IP packet, including the header. After this, we have three fields that concern what is called the MTU, the Maximum Transmission Unit. The different networks that comprise the internet may use different link layers, and different link layers may allow for different maximum size packets. For example, one network might allow for 512 byte packets, but another might allow for 1024 byte packets. Before a packet to be sent across a network with an MTU that's smaller than the packet, the packet must be fragmented into smaller packets. When a packet is originally created, it gets an ID number, and if the packet gets fragmented, the fragments all share this ID number. Now, the ID field is only 16 bits and so only allows for 65,536 unique identifiers, meaning that in fairly short order, a host will have to reuse previously used IDs. What could conceivably happen then is that a host will receive unrelated packet fragments that share the same ID. However, in practice, this isn't a huge problem because there is generally a good amount of time between reuses of an ID and because a packet reconstructed from unrelated fragments will almost certainly get discarded after the receiver looks at the checksum. Remember, IP is consciously designed to be unreliable, so it's okay if some packets get thrown out. Anyway, after the ID field, we have three flag bits. The first bit is reserved for future use, but for now is always expected to be set to zero. The second bit is called the DF Don't Fragment flag. For a packet with the DF flag set, routers will discard the packet if it ever requires fragmentation. This DF flag is useful in a few scenarios, such as when sending to a host which lacks the compute resources to reconstruct packets from fragments. The third bit is called the MF More Fragments flag. This MF flag will be set on any fragmented packets except for the last fragment. After these flags, we have the Fragment Offset field, which indicates the position of the first data byte in the original packet as a multiple of eight bytes. All of this is probably easiest to understand from an example. Say we have this unfragmented packet with 500 bytes and the ID number 35111. Because it's unfragmented, the More Fragments flag and the Fragment Offset are both set to zero. Let's say this packet then gets fragmented into three smaller packets. All three packets will share the original ID, 35111, and all except the last fragment will have their More Fragments flag set to one. The first fragment represents bytes zero to 239 of the original packet, so its offset is set to zero. The second fragment represents the bytes 240 to 479 of the original packet, so its offset is set to 30, because 30 times 8 is 240. The last fragment represents the bytes 480 to 500, so its offset is set to 60, because 60 times 8 is 480. So when a host receives each of these fragments, it has the information it needs to piece them back into the original packet, and also has the information it needs to know when it has all of the fragments. After the ID, flags, and Fragment Offset, we have the TTL, Time to Live field. For various reasons, such as router misconfigurations, packets might end up stuck in a loop, passed multiple times through the same routers. To prevent zombie packets from endlessly going in circles, each packet is marked with this TTL countdown timer. When created, a packet starts with the max TTL value 255, and then this value is documented by each router which the packet passes through. Once the TTL hits zero, the packet is discarded by the next router. After the TTL field, we have the Protocol field, which denotes the protocol of the data contained within the packet. In the vast majority of internet traffic, this will be UDP, TCP, or the not yet discussed ICMP. In principle though, IP packets can carry packets of many other protocols, such as legacy networking protocols like IPX. Next we have the header checksum. A checksum, if you're not familiar, is a technique for error checking. For many variants of checksum algorithms, the variant used in an IP header takes the ones complement of the sum of all the 16-bit chunks of the header, not including the checksum field itself. This is computed by the packet's original sender to set the checksum field, and then each router or host receiving the packet performs the same computation and checks the result against the received checksum field. If the two values do not match, then the header data must have been corrupted in some way and so the packet is discarded. If the two values do match, then it's still possible the header data was corrupted, but it's extremely unlikely, because the checksum would have to be corrupted in just the right way to match the corrupted header. After the header checksum, we have the source and destination IP addresses, no real explanation needed. Finally, if the header length field value is greater than 5, an IP header may end with some extra bytes that specify extra options. In practice however, options are very rarely used. So that's all the most important things to know about IP packets. Next, let's look at UDP and TCP. UDP stands for User Data Grant Protocol, TCP stands for Transmission Control Protocol. The primary difference between the two is that TCP automatically sends acknowledgments of received data and automatically resends data for which no acknowledgement was received within a given amount of time. UDP, in contrast, provides no such features and so is an unreliable way of sending data, just like IP itself. Another difference is that UDP data is sent in discrete independent chunks called datagrams, each explicitly sent by a process. In TCP, a process tells the OS which data it wishes to send, but the OS may choose to send that data in separate segments. The receiving OS then reassembles the segments into their intended order before handing the data to the receiving process. For these features of TCP to work, processes which communicate over TCP must first establish a connection. We'll explain what that means exactly, but first let's look at UDP headers.