 Welcome to the 21st lecture in the course design and engineering of computer systems. So in this week, we are going to study a little bit more detail on how computer networks work and specifically how the internet works. So this will tie together in the broad theme of the course where we are trying to understand how real world computer systems work. Okay, so let us get started. In this first lecture of the series of five lectures on how the internet works, I am going to give you a brief overview of the various components of the internet and then in the next four lectures, we are going to get down into more details. So here is a brief overview of what is the internet. Okay, so the internet is nothing but a collection of various networks, you know, different organizations, regional, national networks, wired, wireless networks, all of them, you know, there are different networks that are all connected and talking to each other, exchanging information that is the internet and all real world computer systems have many connected components which are distributed over this large internet. For example, if you are accessing say a website, you know, there is some computer out there that has all the information, the content of the website and some client out here that connects over the network and accesses this content. So understanding how networking works and how the internet works is essential to understand any concept in computer systems. So typically computer systems are designed as clients and servers, okay, and these clients and servers talk over the network and exchange information and the unit of information exchange on the network is called a packet, a packet is nothing but a stream of bytes. So a client or server, each of which puts some information in a packet, sends it to the other side and so on. So in this week, as I said, we are going to cover some basic principles of how networking works and how the internet works. Note that there are full courses dedicated to this topic, so I am not going to be able to cover full course worth of content here, but just enough to help us understand the broad goal of this course which is how real world computer systems work. So with that introduction of what this week is about, let us get into the details. So I will start with a simple example. So suppose you open your browser and you visit a website, say any commerce website or to book your tickets for travel or any such thing. So suppose you access a website. So let us just see through all the steps of what happens when you access a website. So here is a rough outline of what happens. So internet services, you know, websites or any other applications, ticket reservation to social media, everything, all of these services are provided by servers, application or web servers. So what are these servers? These are nothing but processes that are running on a certain computer, listening at a certain port number, you know, we have seen how to write a simple server like this using the socket API. So of course real world servers are much more complex, but the basic idea is the same. So you have a process or a set of processes listening on one or more machines at certain port numbers and they have the content of the website, they have information about all the products being sold at any commerce site or information about all reservations in a ticket booking portal, whatever it is. So these servers have all the information and clients, you know, users of the computer system access these servers over the internet using a URL. So for example, if you want to access courses on NPTEL, you will type this address in the address bar of your browser, this is called the URL, you will put that in and the clients will reach the content at the server. So what happens when you access this URL, what happens when the client accesses this URL? So this URL has two parts, it has a domain name which is nptel.ac.in and some other information about what content you want to access at that domain. So if you access this URL, the domain name is nptel.ac.in and some other information. So what the client will do is it will take this domain name and using this domain name it will look up a service on the internet called the DNS, clients will contact the DNS and the DNS will give you back the IP address of the server. So this domain name is nothing but an easy to remember name which can be used instead of an IP address to talk to a server. So you use the domain name, DNS will give you the IP address and then use this IP address and the port number, usually port number is well known, you know, web servers listen at port 80 and so on these numbers are well known. So you get the IP address from DNS then use the port number and together you will be able to connect to the server using the combination of IP and port. And we have seen this, you know, the connect system call for example that we have studied last week takes this IP address port number and opens a connection to the server. And then applications that are running on end host like client server like the socket programs we have seen once they are connected this way they will send and receive messages. You can either use sockets, you can use other programming libraries, it does not matter in the end all of them work the same way where you exchange messages and these messages when you write into a socket the operating system converts these messages into packets and sends them over the network. So the server has opened a socket, the client connects to the server socket, writes some data into the socket then this data goes over the network as packets, who makes these packets the OS makes converts the messages or written into sockets into packets and sends them over the network. And what is this network? This network is nothing but a series of routers, you know, which are boxes that you know pass on the packets from one point to the other from the source to the destination. So all of these routers on the internet they will look at this destination IP address. You say this packet has to go to so and so server the IP address will be there and all of these routers will look up that IP address and forward the packets from the source to the destination. And that is why the packets in addition to the actual message whatever you know you want to access a certain course or a video in addition to that message along with that every packet will also have some headers like I want this packet to go to so and so IP address to so and so port number all of that information also will be there in the packet along with the actual message you will also have the headers. And the routers will route the packets from the source to the destination using the information in these headers. And finally you also have something called transport protocols that is some extra mechanism running at the client and server to ensure various things like reliable delivery congestion control. For example, if the network the path going to the server is very congested you might want to slow down send a little bit later. If some message was lost you might want to resend it again there are all of these mechanisms that are implemented by transport protocols. So this is a rough idea of how a client accesses service any service on the internet. And in the next few lectures we are going to understand each of these different components in more detail you know what are transport protocols, what are routers, how does routing happen from the source to the destination. You know what are the various headers that are present in packets all of these things what is DNS all of these things we are going to study in more detail. The one part that we have already seen last week is this part which is writing messages into sockets and reading from sockets how that works we have already seen. Everything else all other pieces of the puzzle we are going to discover this week. So first let us begin with the domain name system or DNS. What is DNS? DNS is a service that is available on the internet using which you can find out IP addresses of various hosts ok. So the mapping between domain names that is you know human readable names of servers to the IP addresses these mappings are stored in DNS records. And these DNS records are available for all clients to query on the internet. So how are these DNS records stored and how are they accessed? So these DNS records are stored at what are called authoritative domain name servers ok. So there are some special servers which do not serve regular content like you know videos or web pages but these servers actually maintain these DNS records and give them out to whoever needs this information. And of course there are like you know a large number of domain names and IP addresses on the internet. So is everything stored in one server? No it is stored hierarchically. What does this mean? So if you have a domain name like nptel.ac.in ok. So there are firstly there are what are called root DNS servers. So these have information about all the top level domains like .in which is our top level domain here or .com which is the most popular top level domain or .edu. You know there are several top level domains like this specific to various areas or countries and so on ok. And there are certain servers that handle all information about that top level domain like you know there is a server that knows all information about domain names ending in .in something else about .com and so on. And then there are root servers that have information about all of these top level domain servers. So if you want to resolve this name into an IP address you will first go to the root server. The root server says ok I do not know about this name but you talk to the .in server and it will give you the address of the .in server. You will go to the .in server and the .in server will give you the IP address of the .ac.in server ok. So the domain name servers at each level will know about the name servers at the next level. .in will know about all the other .in you know .ac.in .ernet.in all those other servers that information will be known by the .in server. And then you talk to the .ac.in server it will tell you about the .nptel.ac.in server ok. So every server knows about information the IP address of the name servers at the next level. And then that way you keep traversing this tree hierarchically starting from the root to the top level domain server to other other other domain servers. And finally you will reach a server that knows about the IP address of .nptel.ac.in and who is doing all this traversal there are special pieces of software called DNS resolvers. So in every organization every group of hosts together they will set up a DNS resolver and this DNS resolver will do all this work for you. So in your network if you are accessing .nptel.ac.in then you will first talk to your DNS resolver and then this DNS resolver will talk to the root server top level domain servers everybody else will talk to .ac.in server then it will find out the IP address of .nptel.ac.in and then that IP address it will return to you. So DNS resolvers resolve a DNS name by walking down the name hierarchically and once you obtain the final IP address these DNS records this information about you know this mapping from this name to the IP address all of this information is also cached for small periods of time at the local DNS resolver. Why because you know if somebody has resolved this name to an IP address and immediately somebody else will ask then you do not have to do all of this again. You know go to the root go to top level domain all of this you do not have to do it again you can just remember. So the final result as well as the IP addresses of all of these intermediate servers that you had to talk to all of that is also cached at DNS resolvers and of course you cannot cache it forever because what if this IP address changes. So therefore these caches have some information about you know how long you should cache these DNS records for and accordingly they are cached. So DNS is a very important component of the internet today and if you if DNS does not work you cannot access any service on the internet because it is very hard for us to remember what the IP address of that service is we always remember the domain name and therefore DNS needs to be up all the time it needs to be fast. So the performance of DNS is very crucial for various internet applications today. So the next thing is what else is a DNS system useful for is it just to resolve names to IP addresses actually people use DNS for many other purposes for example for load balancing. So suppose this NPTEL server or any server is not implemented at one IP address but there are multiple processes which are all serving content why because you know maybe one server can get overloaded therefore you want some users to access content from this server some from this server some from this server that is you want to do some load balancing across multiple servers and each of these servers will have different IP addresses also. So how do you do that the way you will do that is you will store all these different IP addresses in DNS and DNS will return either all of them or you know one of these whatever. So the client when you talk to DNS you will get back all of these IP addresses then some clients will pick the server some will pick the server some will pick the server and the load balancing will happen. So note that DNS does not have to return exactly one unique IP address for every domain name it can return multiple IP addresses for the purpose of load balancing. So that clients have the choice then the other important concept where DNS is useful is the concept of what are called content distribution networks or CDNs. So what are CDNs? CDNs are basically organizations that take content from websites and replicate this content all around the internet so that you can access it faster. So suppose there is a website that has some web page and say this web page is located at some server in India then clients from all over the world will be coming to this one server in India and accessing it and all the links can get blocked due to too much traffic. So instead of that what you will do is this content can be given to companies which are called CDNs and what these companies they have you know data centers networks all around the world. So they will take this content of a website make copies of it and store it all around the world say one in America one in Africa one in Europe whatever different places they will have their data centers and they will replicate this content. So now if somebody from the US wants to access this website this content they do not have to come all the way to India they will just use a local copy from the CDN that is located in USA. Somebody else from Europe can simply access the copy that is there in Europe. So in this way what CDNs do is they will replicate the content of a website or any other internet service all over the internet in geographically distributed locations. So that everybody can access their closest replica and get content faster. Note that you can only do this for static content that is content that does not change on a website. If you have content that is constantly changing then of course you cannot give it to a CDN you still have to go to the main server for that. And CDNs how do CDNs work they work using DNS. So when you resolve a domain name to an IP address in DNS then DNS will find the closest replica of the content geographically that is a CDN has replicated content it will tell DNS. If somebody from this region request for this domain name give them this IP address. If somebody from another region ask for the same domain name give them the IP address of this server that is geographically closer to them. So CDNs work along with DNS in order to redirect request to the geographically closest CDN server replica. This ensures that content is accessed much faster on the internet today. So once again CDNs are also very important part of the internet infrastructure today and are hugely responsible for ensuring that we actually access content on the internet much faster. So next we come to IP routing. So we will study this in more detail in a later lecture but I would like to give you a high level overview of what is IP routing. So every computer or every end host on the internet has a unique IP address assigned to it. Usually it is not the end host it is the interface but we will come to that a little bit later. So for now just assume that every end host on the internet has a IP address given to it. And what is an IP address IP stands for internet protocol and what is an IP address IP address there are two versions of this protocol V4 is what is commonly used. So an IP V4 address is nothing but a 32 bit string and the way you write it is something like this. You know you take this 32 bits split it into 4 groups of 8 bits each and these 8 bits will be a number between 0 and 255 in decimal and then you will take this number dot this number dot this number dot this number and that is how you will represent an IP address in the dotted decimal notation and but inside this notation if you look at it is nothing but a stream of a string of 32 bits. This uniquely identifies every end host whether you are a client, server, you are you know an e-commerce website, you are a browser whoever you are you know you are browsing the internet, you are watching videos whoever you are on the internet you will get a unique IP address. So 32 bits are also not enough because you can only address 2 to the 32 hosts what if you have more hosts than that and we do today and therefore the internet is upgrading itself to a next version IPv6 where the addresses are 128 bits in size but let us leave that aside for now. So you have some IP addresses whether IPv4, IPv6 that uniquely identify hosts on the internet and then all of these hosts are connected by routers. So there are many routers on the internet today whose job is to connect all of these millions and millions of computers that are part of the internet these routers this infrastructure of routers connects all of these end hosts and whenever one end host sends a packet to another end host that packet is forwarded to the correct destination by these routers. And how do the routers do this? They look at the destination IP address and then route a packet on approximately of course you will not go all round and round and send it, you will try to send it on the shortest possible path to the destination. So suppose here is a client and here is a server and there are you know many paths to go to this server then each router on the path will pick the shortest path from the client to the server and forward data from the client to the server, server to the client and so on. So the client is sending many packets and each of these packets will have the address of the destination will have the server's address inside them and each router will look at this destination address and send the packet to the server. So then the next question comes up how do these routers know what is the path? You know how does this router know I should send the packet here, how does this guy know I should send the packet here? So which is why one way to do this is you could have had sort of a centralized design where you know somebody knows the entire topology of the internet and they will tell every router hey you know you send the packet here to reach this guy and so on. You know this information can be told to all the routers but of course the internet is so huge it has billions of hosts and nobody knows the entire topologies so that is hard to do. Instead these routers run what are called routing protocols that is they exchange some messages between themselves to figure out all of these paths on the internet. So these routing protocols run at the routers so these routers talk to each other, exchange some information and as a result of that they are able to compute the shortest paths between any two hosts on the internet. And once they know the shortest path they will use that during forwarding. So every router does the routing protocol part of it which is calculating the shortest paths and then using this information for actually forwarding the packets sent by the end hosts. And one thing to note is that the internet is not just one coordinated network maintained by one person it is many different networks you know running many different applications set up by many different organizations and all of these are united by this IP routing addressing and forwarding. So whichever network you are as long as you can forward IP packets you can become part of the internet you know your server could be running on in some data center managed by somebody else your client could be you know your browser on your phone inside a mobile data network does not matter as long as everybody understands IP addresses can forward IP datagrams we can all talk to each other. In some sense IP unites all of the different networks that comprise the internet today. So next what I am going to talk to you about is some sort of design principles behind how the internet was designed what was some of the thought process that went into designing the internet. So the internet today follows what is called a packet switching architecture that is you have end hosts say your client and server and you have a bunch of routers or you know devices that switch packets along the path and each of these routers does not really know that you know you are talking to the server right they are not dedicated to just forwarding your traffic. So they forward traffic from many hosts to many other hosts. So there is no notion of an end to end connection on the internet today. Every router it look at every packet every packet has header information that says where this packet should go where it is coming from where it should go to and every router will look at each packet independently and forward it as best as it can. And sometimes if a router is busy when a packet comes up it may not be able to immediately send the packet it might store the packet that is why these routers are also called store and forward switches you know you can when a packet comes you store it for some time then when you are when the router is free it look at the header decide what to do with the packet and send it along. So sometimes some router may not know what to do with the packet it may not have any place to store the packet it might be too busy it can drop the packet anything can happen. So which is why it is called best effort forwarding on the internet today. And of course every packet because it needs to have a header it adds also some overhead especially for small packets. So this is a very different architecture from what was used before the internet came up when you had telephone networks that was called the concept of circuit switching. What is circuit switching in old and telephone networks when you made a call when somebody dialed somebody else then a notion of a path would be established along all routers between this starting point of the call and the terminating point of the call. And all of these routers would be aware of this connection they would reserve some resources for this connection because you know why traffic is coming in continuously they would reserve resources and everybody along the path knew that this connection was set up. The call ended this connection would be taken down that is everybody knew of the end to end connection maintained state reserved resources for this end to end connection. That is called circuit switching in the internet today that does not happen if you are talking to an NPTEL server nobody no router on the path knows about it or cares about it. You send a packet they will forward it that is all. So why is this why have we moved away from circuit switching to packet switching the reason is that internet traffic is bursty. So you will download some set of packets from the NPTEL website you know you will send a request get some data then why as long as you are reading that page and until you click on the next link nothing is happening there is no traffic being sent. So internet traffic is bursty that is you will send some traffic for a long time there will be nothing then this connection will send some more traffic for a long time there will be nothing. So when you have bursty traffic it does not make sense to set up a connection reserve resources tell everybody about it everybody keeps state about it you know remembers this connection it does not make sense why because multiple connections can share the same resources you do not have to reserve any resources for the connection unlike voice traffic which is more uniform and goes on for continuously voice traffic is sort of more constant rate whereas internet traffic is bursty. Therefore the internet works on the principle of packet switching if you have something to send I will send it if you do not have anything well and good but I am not going to make any special efforts for any particular connection that is the design philosophy of the internet. But then you might be wondering then what about all this connection based sockets you know we established connection to a server and all of that what is happening to all of that. So all of that information the fact that you are talking to the server and you know some ways of establishing reliability all of that only happens at the end host. So this argument this design philosophy is called the end to end principle or the end to end argument in internet design whatever complexity you have you know for reliability you have to remember that I have sent so many packets that packet is lost I have to retry it again you know I have to slow down the server is too slow all of that logic happens only at the end host nothing happens in the middle of the network the middle of the network the switches are simple they will just live life one packet at a time look at a packet forward that is all they are not aware of any end to end connections but the end host who are talking to each other they are aware of these end to end connections and whatever logic you have pertaining to a connection like reliability or congestion control or retransmissions whatever you want to do specific to that connection you will only do it at the end host. So this is called the end to end principle on the internet and this has greatly simplified the internet architecture and this principle is responsible for this humongous growth of the internet today because the network switches themselves are very simple there is no complexity there. And this end to end logic this is implemented in transport protocols which run at the end host. So popular transport protocol is what is called TCP or transmission control protocol. So this runs at the end host and does various mechanisms has various mechanisms in order to deliver all the bytes of a connection reliably and in order you know for example the logic that is running at the end host you will slow down the sender if you find that the network is congested and if you find or you find that you know the receiver is too slow you know you will do what is called congestion control which is slowing down the sender if the network is congested, slowing down the sender if the receiver is congested that is called flow control. If some packet has been lost some router dropped it you will retransmit that packet. If packets have been jumbled up in the network you will reorder them give them to the application in the correct order you know when you write into a socket you will read on the other end from the socket in order not some jumbled up garbled stream. So all of this logic is guaranteed only by the end host not by the routers that are in the middle and we have several transport protocols today like TCP, UDP, STP we will see them later. And different applications can use different can reuse all of this functionality of a transport protocol. So every application does not have to do congestion control reliability on its own if you simply use TCP sockets you will get all of that functionality. And what is a protocol when I say a transport protocol or all of these the word protocol is used often in networking what does it mean just means you know a way for two entities to communicate with each other you know what messages should I exchange what packet format should I use you know all of these things are specified by the protocol. And the internet has many protocols that are in use. And finally the one final concept with respect to how the internet is designed that I would like to discuss is the concept of layering. So the internet is built as a series of layers internet software and hardware is built as a series of layers and layering is nothing but a way to impose modularity in computer systems. If you have many different entities like there are routers, there are transport protocols, there are applications there are many different things on the internet. And if everybody is talking to everybody else it gets very unmanageable. So therefore you divide all of these entities into layers and only the entities in one layer will talk to each other, and the entities in the other layer are not aware of the entities in one layer. So the lower layers will talk to each other and they will provide some service to the higher layers. And the higher layers will talk to each other, provide some service to the layer above them and so on. years in a layer talk to each other and the lower layers provide some service or some abstraction to the higher layers. So the internet what are the various layers in the internet today? So first you have the clients and servers, you know you have your browser that is talking to a website, all of these are clients and servers on the internet and they exchange information, they exchange messages with each other, all of this is called the application layer. And these messages are sent reliably by the transport layer. So the transport layer like TCP takes care of taking this message, ensuring that it reaches reliably on this connection on the other side. And finally you have the next layer is you have the network layer or the IP layer. So this is called layer 4, this is called layer 3, these IP layer is just forwarding your packets. You do not care whether it is from a browser to some server to any commerce website to a ticket reservation website, you do not care, the IP routers simply just forward packets from one IP address to the other, they do not know what is there inside the packets. So note that can you see the concept of layering here, this guy does not care about this guy, this guy does not care about this guy, each of them works independently to do their job. And then you have the layer 2 underneath this network layer you have what is called the link layer that basically on each of these link, how do you forward packets from one IP router to the other, how is traffic forwarded, that is taken care of by the link layer because you have many different kinds of links, your wireless links, wired links, all of that is handled by the link layer, the link layer has a bunch of switches which forwards traffic from one IP address to the other. And finally you have the physical layer, layer 1 which is actually how are bits sent over the wire or over the air, okay. So this concept of layering simplifies the design of network software and hardware because each layer is only worried about its functionality and not of what is happening everywhere else on the internet. And where are all of these layers situated? So the application layer, the actual layer that sends and receives messages. This is usually built in the form of user software, okay, software that sends and receives messages using sockets or you know any other such networking library, that is your application layer. And the transport layer is located mostly inside operating systems which exchange. So the transport layer will take the message sent by the application layer, add its own headers to it and this is called a segment, okay. The transport layer makes a segment and all of this is done by the operating system and then these segments are sent out on the network and modern operating systems support many transport layer protocols and all of this transport layer logic is implemented inside the OS when you read and write into sockets. Then you have the network layer that runs at the end host as well as all the IP routers on the internet, all of these IP routers they run the network layer and what they do they forward IP datagrams, note that the packet is a generic term but each layer will give this packet a separate name like application layer message, a transport layer segment and IP datagram and so on. Then finally the link layer it is present in all the network cards, device drivers, switches which forward packets and at the link layer these packets are called link layer frames between one IP router to the other on this link whatever happens that is the link layer. And finally you have the physical layer which runs fully in hardware inside network cards and routers and all the hardware it basically deals with how do you send bits and how do you convert these bits into analog signals you know like electromagnetic waves you know there are techniques called modulation demodulation how you convert a bit into a wave and send it out all of that is taken care of by the physical layer. So this is the location of the various layers on the internet at the end host you have the application you have the transport layer then at all the routers you have the network layer on each link you have the link layer and finally on each wire or wireless physical media you have the physical layer right. And how do you a good way for you to visualize all of these layers is to use what are called packet capture tools ok. So wire shark is a popular example of a packet capture tool you can install this tool on your computer and actually see all the packets flowing on your network and if you click on each packet. So if you look at you know this is the interface this is how the wire shark screen looks like there are many packets and if you click on each packet you can actually see all the layers you know you have one payload and this payload is put inside various headers various headers of various layers and you know you can also have footers but you mainly have headers and then how the packet is built innermost there is some layer there is some message that is put in this application layer headers, HTTPs the application layer protocol then there are TCP headers you can click on each of these and see what are all the headers added and what are all the layers at work ok. And each layer note that it adds its own headers and on the other side when you are receiving a packet you will keep removing these headers ok. So this is a summary of what we have seen in this lecture today we have seen how the internet works we have studied principles like layering, packet switching, the end to end argument and with this lecture I hope that you have a rough idea of what happens when you access any internet service over the network. And to help you understand these concepts better please install a tool like wire shark which is very easy to install very easy to use and capture some packets you know maybe when you are browsing a website look at all the packets that are going in and out of your computer look at all the protocols, the headers, the various layers to you know see all of this in action in front of your eyes. Thank you all that is all I have for this lecture we will continue a discussion on all of these topics in this week. Thanks.