 a 22nd lecture in the course design and engineering of computer systems. So this week we are discussing how the internet works and in this lecture we are going to understand how internet routing and forwarding works, so let us get started. So in the previous lecture we have seen the concept of the various layers that are there in the network stack, internet software is built using the principle of layering and in that the network layer is sort of in the middle, it is also called the IP layer or layer 3 and it deals with forwarding of network packets from one IP address to the other, right. We have seen this in the previous lecture. So now let us understand what is an IP address, an IP address is a unique identification that is given to every network interface that is connected to the internet, earlier we said that every end host has an IP address, it is actually every network interface that is if you have an Ethernet card and a Wi-Fi card on your laptop you will get two IP addresses. The Ethernet card that interface will have its IP address, the wireless interface also will have its own IP address, ok. So every interface that is connected to the internet has an IP address and these IP addresses are 32 bit either 32 bit IPv4 addresses which are written this way in the dotted decimal notation or you can have 128 bit IPv6 addresses. So IPv6 has come about primarily because of a shortage of IPv4 addresses, we are running out of IPv4 addresses and end hosts are connected to each other using a series of IP routers and these routers run routing protocols, ok. So these are distributed decentralized protocols that is each router runs this protocol on its own and they exchange information with each other and as a result of this communication as a result of this routing protocol, all the routers discover all the paths between the various hosts of the internet and all this information about all the routes that are available is stored in the routing table. And from among these multiple routes available, one of the routes is chosen, the best route is chosen for forwarding packets, ok and you store the best route in what is called the forwarding table. So the forwarding table has for every destination which is the best path to send the packet to. For example, if you know you have a long path like this and a short path like this between two hosts A and B, then this routers along this shortest path will send the packet like this and not send the packet like that, ok. So usually routers forward traffic on the shortest possible or the best possible path and that information of for each destination which is the best path and where should I send the packet to in order to go on the best path, what is my next hop, that information is stored in the forwarding table. So the forwarding table is some kind of a summary of the routing table. And note that not all IP routers, so I am just drawing a line like this between routers, but does not mean they are always connected by like you know one wire running from one to the other. So the link between two IP routers can also be a very long link, there could be other elements on the path. So all of that the IP layer does not care about that is taken care of by the link layer. So the link layer deals with how to provide this connectivity between two end points between two IP routers or an end host and an IP router that is taken care of by the link layer, ok. We will come to that later. For now just assume that there are multiple hosts and routers each with their own IP address which are all connected to each other. So let us understand next the concept of what is an IP prefix, ok. So now we have seen that routing protocol exchange information about hosts. Now at what granularity do you exchange information? You know there are 2 to the 32 IP addresses and if everybody has to talk and exchange information about all of these 2 to the 32 hosts that is a lot of information on the internet. So we do not want to tell everybody about every host, instead what we do is we exchange information at the granularity of groups of hosts or subnets that is IP addresses are grouped into subnets. Now by using a common prefix they are grouped hierarchically. So if you have an IP address you know 32 bit IP address and the first 24 bits are common, the first 24 bits are 192, 168, 10 and the last 8 bits can be anything then how many such IP addresses do you have? You will have 256 such IP addresses, you know the last 8 bits can be any of the 2 to the 8 possibilities. Then these 256 IP addresses are combinedly referred to as 192, 168, 10 slash 24, ok. So the slash 24 is saying that the first 24 bits are common, the last 8 bits can be anything, ok. So this is called an IP prefix. An IP prefix is a group of internet addresses which have a common prefix. Similarly if you look at this prefix, this is the first 8 bits are common and of course if the first 8 bits are common then inside this you will have 256 slash 16 prefixes and each slash 16 will have 256 slash 24 and each slash 24 will have 256 IP addresses, ok. You can group these prefixes into bigger and bigger prefixes and a prefix is also denoted by its subnet mask. The subnet mask specifies that the first 8 bits, 255 are used, you know all once the first 8 bits are used then remaining bits are not used that denotes a subnet mask, ok. You can specify either the prefix length, you can say this is a prefix of length 24, a prefix of length 8 or you can specify the subnet mask. These are two ways of describing how big this grouping of IP addresses is, ok. And IP addresses are assigned at the granularity of IP prefixes, if an organization, so each end host will not go somewhere and get an IP address for itself, instead an organization will be given an IP prefix, a group of IP addresses will be given. Now, you know if an organization is given a slash 8, it can split it into multiple slash 16 prefixes that can or into slash 24 prefixes, whatever within an organization you can always split a prefix like how the slash 8 contains multiple slash 16s which contain multiple slash 24s, you can always split a prefix into smaller prefixes. But they are managed at this granularity, not at the granularity of individual IP addresses. Similarly, routing protocols also exchange information at the granularity of IP prefixes. So, every router will say, ok, these are all the prefixes that I am managing, I know how to get to these prefixes, this is the path to this prefix, all of that information is not stored at the granularity of individual destination, but at the granularity of IP prefixes. And when a router receives an IP datagram, even your forwarding table, everything will have for this slash 24 prefix, for this slash 8 prefix, for this slash 16 prefix, for various prefixes, this is the path, right, that is how your routing tables, forwarding tables, everything will be stored. And when a router gets an IP packet, it will find out for this IP address which prefix does it belong to and accordingly for that prefix, what should I do, where should I forward the packet to, ok. Now, what if multiple prefixes match, for example, if you have an IP address, you know, 192.168.10.1, this IP address matches this slash 24, it matches this slash 24, it also matches this slash 8, 192 slash 8, it matches that prefix also, what if it matches multiple prefixes, then the logic is that you will pick the longest prefix, that is the most specific prefix, because that is the convention on the internet. The most, the longer your prefixes, that is more preferable. So, in this way, routing tables, forwarding tables, store information about internet hose, this routing information or forwarding information at the granularity of IP prefixes of various lengths. And this matching algorithm, where if an IP address matches multiple prefixes, you pick the longest one, this is also called the longest prefix match or LPM algorithm. So, next let us understand how the internet structure or the internet topology looks like. So, the internet is not one big giant network, but it is composed of multiple smaller independent networks which are called autonomous systems or ASS. So, this autonomous systems can be either end user organizations like organizations that have clients, servers, your college, your company or somebody providing a web service, NPTEL, all of these are end user organizations. The internet has these end user organizations, it also has what are called internet service providers who connect all of these end user organizations. So, you have an end user organization here, here, all of these are connected by an ISP or an internet service provider. And then you have another ISP, then you have another ISP, these ISPs are all connected to each other. So, you have multiple tiers, multiple layers of these ISPs, which together connect all these end user or stub organizations. And your internet looks like this, there are various end user organization, then there are various ISPs which are all connected to each other, there is a hierarchy and this is how the internet is structured. You have stubs and you have ISPs. And every network here, you know whether it is an ISP or an end user organization, each network will get a few IP prefixes for itself, from where there are what are called registries in the internet, you know, which are sort of neutral third parties which allocate IP addresses to these various organizations. So, if you are a company or an organization, you will go to a registry, get an IP prefix and then you will allocate the IP prefix to all the computers, to all the hosts inside your organization. And how are these IP addresses distributed within an organization? You know, an organization got a slash 24, you have 256 IP addresses, how do you give it to? You can either statically give it to a host, you can say every computer, this is your IP address. But what if you know a computer is not using its IP address, therefore you can also do a dynamic allocation. So, there is a protocol called DHCP, there is a DHCP server and if a computer wants an IP address temporarily, it can go to the DHCP server and get an IP address, use it for a short period of time and when the computer is not using this IP address, somebody else can use this IP address. In this way, IP addresses can be statically assigned or dynamically shared between all the hosts in an organization. But of course, you only have to assign addresses from the prefix given to you, you cannot randomly pick any IP address that you want. And then every autonomous system has some routers at the border, which are called border routers. What these border routers do is, they will announce these IP prefixes to other autonomous systems. For example, if you have some IP prefix, you will tell your ISP that hey, these are all my IP addresses, please tell everybody. And this ISP will tell everybody that hey, these IP addresses are here, you know this autonomous system owns these IP addresses, you will tell that information to everybody. That way everybody on the internet will know who you are and whenever somebody wants to send some traffic to you, then they know your location, they know that you are located here and they can send the traffic. So, you advertise your route this way and traffic flows in the opposite direction. You tell somebody hey, look, this is my IP address and then that somebody can send back traffic to you using this routing information that you have provided. So, the purpose of ISPs is basically they propagate this information. End users have these IP prefixes and these ISPs will propagate that information about the end user so that everybody knows about all of these end users. And these ISPs, they charge a payment for it. You know, if you have certain IP addresses, they will announce these IP addresses to the rest of the internet on your behalf and for that you will pay your ISP, you know. That is what it means to get service, to get internet service from an ISP. What does it mean? The ISP will come, put a wire into your network and whatever are your IP addresses, it will tell everybody else so that everybody knows about you and can send traffic to your IP address through your ISP. And within an autonomous system, also you will have internal routers. So, an autonomous system can be big and inside that also you will have routers. So, you will have some border routers which are talking to the outside world and you will also have some internal routers which are also exchanging information because you have split your network into smaller subnets, your prefix into smaller subnets assigned it to different parts of your network. And then these routers are also talking to each other and exchanging information about smaller prefixes. So, in some sense, routing happens in a hierarchical manner. So, there is this big prefix that you have advertised through your ISP and ISP will tell everybody else. They know where this big prefix is and all traffic will come to you to the border router. Then the border router will now send the traffic inside, then there are smaller routers. You know, you announced a slash 8, traffic to the slash 8 prefixes coming here. Then each of the slash 16s, this border router will distribute. Then each of these routers will then distribute to the various slash 24 prefixes. In this way, routing and forwarding happens hierarchically on the internet. You know, you will traffic to a slash 8 is distributed to the slash 16s to the slash 24s and so on. And this is important for the scalability of the internet, this hierarchical routing. You know, if everybody is learning about every host on the internet, it gets messy. So, instead, you will the border routers will only announce the bigger prefix to the ISPs and internally whatever routing decisions have to made will be made by internal IP routers. And then all of these routers, they basically exchange information about IP prefixes so that you know where each IP prefix is and you can construct this network topology and you can compute the shortest path. So, you have routers exchange information about IP prefixes and also some notion of a link metric. You know, to identify this link is expensive to use, this link is shorter, this link is longer. You will have some kind of a link metric so that you can identify the shortest path. And all these routers are exchanging information and periodically they will compute the best route to every destination or you know, whenever some failure happens, something happens, your network topology changes, all of these routers are dynamically constantly exchanging information and recomputing these best routes. So, there are roughly two philosophies of how you design routing protocols which are called link state routing as well as distance vector routing. So, what is link state routing? Suppose you have a topology like this of you know, multiple routers and then what is link state routing is every router will tell everybody in the network information about all its links. So, this router will say I have so and so all of these prefixes I am managing and I am connected to this guy. This router will say these are all my links, this router will say these are all my links. In this way, every router will tell the entire network about all of its prefixes and links, its information, its story, it will tell everybody in the network and once everybody knows about everybody else, then everybody can construct this full network topology. And once each router has this full network topology, if I want to go from this point to this point, this is my shortest path, everybody can compute that and accordingly forward packets. The other type of routing protocols is what are called distance vector routing protocols. That is routers do not tell everybody their information, instead they will only gossip with their neighbors. So, this router will say oh I know how to get to those IP prefixes, then this router will say okay if you know then I will use you. So, for example, if you have a some destination D over here, this router will say I know how to get to D, this router will also say I know how to get to D and I have a three hop path, this router will say oh I have a two hop path to get to D. Then this router will decide, after listening to the gossip from both these guys, it will decide okay maybe I will go like this in order to send a packet to D. In this way nobody has a full knowledge of the entire network topology, everybody is only talking to their neighbors. But from what their neighbors tell them, they are going to pick which neighbor to use to go to any destination. So, in general, I mean these are two different ways of designing routing protocols and in general link state protocols are somewhat better because they give you a more accurate picture of the network topology and therefore in case any failure happens then link state protocols can recover from that failure faster. But in the real world you have implementations of both these protocols in use and in a networking course you will study these protocols in more detail, we would not have time to cover them here. And the other classification of routing protocols is within a domain the routing protocols that are used are called intra domain routing protocols and these can be different from what routing protocols are used by border routers which are inter domain routing protocols because you know you have different considerations. Inside a network you just want to pick shortest paths but outside on the internet you might also have some policy considerations oh this guy paid me more money this guy did not pay me any money at all therefore I will pick this network. This network is cheaper to use this network is more expensive to use you might have other considerations not just the shortest path. Therefore, you have an inter domain routing protocol that is very popular today called the BGP or border gateway protocol that is like distance vector protocol but slightly different which can help you express all of these constraints also and it is that is why it is used for inter domain routing whereas within a network you will use a simple protocol like OSPF is a very popular protocol today which is a simple link state shortest path protocol. So, whatever is the routing protocol in use there are different routing protocols within a network and outside a network and the network operator has to suitably pick a routing protocol based on his or her goals. The other concept I would like to introduce is what is called a label switched routing that is traditional networks they use destination based routing given a destination I will take the shortest path to the destination but sometimes taking the shortest path may not always be the good idea what if you know some failure has occurred the shortest path is congested you know suppose you have a network like this and there is one long path to the destination and there is one very multiple long pass to the destination and one direct path to the destination and your destination is safe here and you have traffic coming from A, B through some router C going to this destination D. So, if C will you know for every packet it will always pick only the shortest path look at the IP address pick the shortest path then this link will get very congested to go to D this path will get very congested when there are other paths that are free right you might have as well split your traffic across all of these paths. But if you are just doing shortest path you know destination IP based shortest path routing you will always pick this path no matter how congested it is. So, an alternate way is to do what is called a label switched routing that is on every flow for example packets coming from A on this A to D path you assign some label to packets on this B to D path you assign another label to packets and you say that traffic with label 1 goes like this and traffic with label 2 goes like this. So, label switched routing so there are many protocols for this MPLS is the most popular one with labels switched routing of course is a complicated topic but the high level idea is that you can attach an extra label to a packet and use this label in order to do your forwarding you create separate forwarding tables based on this label. So, that you do not have to always do the shortest IP destination IP based forwarding you can assign these labels and then based on your load if this link is congested you move your label to here you move label 1 here you can you know shuffle around your labels you can pin different labels to different paths and you know ensure that your network is not congested and it is more optimally used. So, all of this is broadly called traffic engineering that is you can pin different flows to the same destination to different paths for better load balancing of traffic and this label switched routing also helps you recover from failures faster because you know you are no longer waiting for the IP based routing to converge. So, this is again not for every network but for large data center networks this can be used in order to efficiently use your network. Then the other concept that is being used in data center networks today is what is called software defined networking or SDN. So, until now what we have seen is every router does both the control plane that is the routing protocol as well as the forwarding and this control plane is distributed across different routers. But instead what people are designing in data centers today is they are using what is called software defined networking where you separate out the control plane the routing into a centralized controller that is there is one centralized controller and there are multiple switches in the data plane that are forwarding traffic and this controller will tell all of these switches what to do use this path do this this is your shortest path this path is better for load balancing this path is better for traffic engineering you know you can do better routing better traffic engineering recover from failures better all of that you can do better because you are using a centralized control plane and a distributed data plane. And so of course it has its own problems you cannot do one centralized control plane for the entire internet therefore this is mostly used in data center networks and you know small scale networks like this where there is a large amount of traffic and efficiency is important. So this is a new idea for how to design networks that is very different from the traditional ideas that were used in the internet. So the one other concept that is important when we study real life computer systems is the concept of private IP addresses. So there are of course there are 2 to the 32 different IP addresses IPv4 addresses possible out of all of these IPv4 addresses 2 prefixes, 2 groups of IP addresses this one and this one are reserved for use within organizations internally these are called private IP addresses. So what does that mean? These IP addresses are used only within an organization only for intra domain routing and are never announced in inter domain routing. And therefore multiple organizations can reuse the same IP addresses. In this organization also you can assign an address 10.1.1.1 in this organization also you can assign this address. So these are like nicknames these private IP addresses do not have to be globally unique they are only limited to one organization and announced only within that organization. So why do we use private IP addresses because you know IP addresses are getting exhausted we may not have enough IP addresses for everybody. So therefore that is one reason. The other reason is that you want to isolate the hosts in an organization. If somebody has a private IP address nobody from outside can talk to this host. Why? Because these private IP addresses are not shared with anybody it is like your secret, your secret nickname. Therefore it also gives you a certain amount of security. And if you have 2 different networks with private IP addresses you can also connect them over using VPN software virtual private network software. So suppose you are logged in your college has a private network and you want to log into your college from college network from home then you use a VPN that is a way to connect these private networks to each other over the internet. And so organizations today usually use a combination of both public and private IP addresses. Note that you cannot just live with private IP addresses. You need some public IP addresses for example for things like servers which are connected to outside client somebody outside has to send you a request send you a connection request then you need a public IP address that has to be announced via DNS and has to be known to the client. But if you are only talking to people within your organization then you don't need a public IP address you can simply use private IP addresses. It is like in your home all of you can address each other using nicknames but when somebody outside has to talk to you they have to use your publicly known name. That is the concept of public IP addresses and private IP addresses. And if there is a client inside that has to talk to an external server then also you need a temporary public IP address. Clients also need a public IP address why because you need the source address also put in your IP datagram and that has to be a public address. So temporarily IP addresses that are public IP addresses that are needed are assigned via what are called NATs or network address translator. That is if you have a client inside a network that has to talk to some other server outside and this client is using a private IP address then this network address translator or the NAT in your network will temporarily for this connection give you a public IP address and rewrite the header in your packet and on the reverse direction it will remove the public address and give you a private IP datagram. In this way this network address translators will just you know with a small number of public IP addresses they can manage large networks. So now putting all of these concepts together let us understand you know you have a client and you have a server on the internet how do they talk to each other. So this client will use DNS get the IP address of the server open a socket here write the message to the server into the socket then the OS will add all the headers the transport layer processing all of that send the packet out the packet will go through a series of IP routers and finally reach the server and how do you get to your IP router on the way you will have multiple ethernet switches or you know if ethernet is your link layer technology or some other wire or over a wireless network you will send the packet to this IP router this IP router might use another link layer technology to go to this other IP router in this way across different different links you will jump from one IP hop to the other and finally reach your server okay. So there are different layers here you have the application layer the transport layer IP layer link layer and finally the physical layer all of these working together to establish this end to end communication and of course there are many steps here many things can go wrong your connection can break at any point which is why we have a bunch of debugging tools available to us to help us debug the working of networks for example there are tools that will help you resolve a DNS name to check if DNS is the problem you can do something like ping is a special packets are sent to any IP address to see is the IP path working you can do trace route where you can actually find out if you run the trace route tool it will tell you the locations of all these IP routers along your path you can find out information about your link layer device itself using commands like if config you can find out information about all the socket communications going on using tools like net stat in this way for each layer there are separate debugging tools that will help you understand how that particular layer is working. So in this lecture I have briefly covered how IP addressing routing forwarding works and how the link layer works so I did not go into a lot of details into any of these because covering them in a lot of detail requires a complete course but I have given you enough to help you understand the end to end packet flow in a computer system and if something goes wrong how do you go about debugging this. So please try to use these tools that we have studied on your own this will help you understand these concepts better. Thank you all that is all I have for this lecture and let us continue this discussion in the next lecture.