 So we are going to discuss networking technologies today. This is well known. Internet and web permits access to information that is stored on different computers across the world. And this accessibility is to be incorporated as a component of our information system. This is the basis of distributed systems. Distributed systems is not a technology. Distributed systems is a paradigm, is a methodology. And it uses heavily the notion of networking. We have already seen that from a single computer hosting our programs and running our programs which are accessed from a PC or from a screen and monitor and keyboard attached to that computer itself, that's the most primitive computing model. We are already familiar with the notion that there could be a server and that server could be connected to multiple terminals or PCs on a local area network. So we did discuss the two-tier and three-tier implementations. We are now talking about extending the technology, understanding both the local area networking technology, the wide area networking technology. In fact, a technology which permits interconnection of any two intelligent devices which as we shall see also include networking of components inside a computer. So if you have a processor and memory, there is some connectivity between those. If you have a processor which is executing some program to do some disk IO, there is some information that is flowing between the processor and the disk. We normally call that connection as the bus which is internal to a computer and that's the reason why ordinarily we discuss the computer technology before discussing networking technology. But networking being relevant, that's what we are going to discuss now. So distributed databases are possible because we can share resources. Now we are talking about multiple servers located at different places. You can achieve higher reliability because even if one system fails, you can actually ensure that there is some other system which is capable of doing identical information processing tasks is available. Affordable information systems are possible because you don't have to go near the computer to exploit it. You can be sitting anywhere else and as the networking costs come down, access to information system becomes affordable. And of course you can scale because you can put multiple servers not only at a single place but at multiple places and therefore achieve much higher number of transactions and a larger number of users. These are some fundamental objectives behind any networking, not necessarily wide area networking. We shall see the networking types in a moment. Here is the scale of network. But essentially you can have a small network which comprises of a single circuit board or you could have a computer system which spans typically a meter. Within a room you could have a network which will span about 10 meters. Similarly building campus. Now computers connected to each other in a room, in a building or even across a campus typically qualify for a local area network label. So all those computers are connected with each other will be said to be connected through a local area network. All the computers that you see connected in this campus for example are on a large local area network within the campus. As many of you would know we run optical fibers across through switches and other things through which we connect them. So actually it is not a single local area network as we shall see. There are multiple local area networks which are connected again on a land technology itself. Whereas when you go across any campus, across physical distances where the local area network technology is either not feasible or is not cost effective, you create another kind of network which is called wide area network. In fact you can see that wide area network and local area network overlap in terms of a campus. So within a campus you could have a wide area network which connects multiple local area networks or a local area network which connects multiple local area. And of course when the distances grow large you are necessarily required to use wide area network. The purpose of networking and the objectives we just saw so our idea is to understand how exactly are these computers connected to each other, how exactly do they exchange information and from our perspective as information scientists what is it that is of relevance. So we shall be discussing a few things in rather great details but you may wish to omit some of those details. What is important is the principle of operation and some of the things which are relevant to us I will mention so you could sort of remember them better. Here is the basics. We want to use communication between any two computer systems on the network in order to exchange information. That is the simple objective. And in order to handle because when you say information it could be what you want to transfer a file from one place to another. It could be a large message. It could be just hello how are you, small message. So the quantum of information cannot be predicted ahead of time. It can be as small as a few bytes. It could be as large as a few gigabytes. Now since the amount of information that is to be exchanged is indeterminable in advance. You have to standardize some way of exchanging this information. And the standard way of exchanging this information is you break this information into smaller chunks which are called packets. This breaking up has to take place by sender. Suppose I want to send some message or information to him. On my machine I must ensure that whatever is the amount of information I break it up into packets and I send those packets one by one to him. His computer collects those packets and reassembles the original sequence. Now very obviously if that has to be done then the packets must be tagged with some kind of a sequence number. So issue because when I transmit it is not guaranteed that my sequence of the packets will be maintained by the communication mean. Fifth packet may reach before the third packet. And therefore there is a task that he will have to do or his computer will have to do at the other end to assemble. But this is the fundamental principle of modern communication. You don't send any amount of information as one huge chunk. You break it into smaller pieces which are called packets. In fact this technology is now so versatile and standardized that even voice communication when you speak particularly long distance you believe you have got a physical connection and your voice in the analog mode is being carried. Those days are over. Modern telephone exchanges even when they take voice with you on the network your voice is actually sampled and broken into pieces converted to digital voice is converted into packets. These packets travel. They are reassembled at the receiving exchange which is again converted back to sound voice and he sent back. So packetization is an absolutely standard part of modern communication. We are of course concerned with the packetization only in the context of discrete information or bits and bytes because that's what we are talking about. The example I gave was to illustrate that even if you have voice or image for that matter as long as you have digitizable analog information you can digitize it. Once you digitize you get a sequence of bits. A certain number of bits might form meaningful chunk like a pixel which may be 6-bit or 8-bit or a voice sample which could be even 4-bit or 3-bit or 8-bit or 16-bit but all these packets can all this information can be packetized in a suitable form to be transmitted forward. Now this is the fundamental requirement. Basically we want that any network that we construct must ensure that all packets of information reach the destination. That is clear. If some packet is lost I am gone. Also you see the word eventually. That means we do not necessarily want the network to guarantee in all circumstances that the packets must reach immediately. All the packets must reach in a given sequence. We are not insisting on it. All that we are insisting on is that eventually all packets must reach. There are some applications where this eventually word does not apply. Things have to happen in real time. The voice is an example of how would you like if hello how are you instead of that you listen hello how are you and after 30 milliseconds are then makes sense to you. So there the sequencing must be appropriate in real time and therefore the word eventually will have a different connotation. Now since we are talking about communication between two computers each computer incidentally is called a node. Any computer or any intelligent device which is connected on to a network is a node. So communication is essentially between nodes and nodes must have address. How do you recognize out of millions of computers which may be connected on the network that this is my computer which is sending packets to his computer. So that means my computer and his computer must be uniquely identified within the network of all thousands of nodes that you have. Consequently you need a unique addressing mechanism. Then you need a communication protocol. Supposing you have unique addresses because the channel in which this communication is happening is most likely to be a shared channel. Just as I am trying to send packets to him somebody else is trying to send packets to someone else and all these packets should not get confused with each other. So to establish proper and valid communication between any two nodes there must be a protocol. Protocol is nothing but a set of rules through which the communication is established and is guaranteed to be error free and complete. This protocol has evolved as a standard. There are in fact multiple protocols. Protocol by the way is nothing but an agreed arrangement between any two parties for exchange of any information. So there are political protocols that when a president of a country visits he will be received by so and so. If a prime minister visits he will be received by so and so. If a professor visits he will be received by nobody etc. So that's a protocol. Now these protocols are very much standardized in every aspect of life. We are talking about the communication protocols between nodes. There is an intermaster standard. It is generically called REE 802.X. This X is the generic part. So it started with 802.1, 802.2, 802.3, 802.4. All these standards have evolved. Anybody has an idea of what is the current standard? 802.1. 11 or 10, 10 and 11 are roughly what you will notice these standards. If you go to your internet browser or if you have a laptop you can easily check what is the protocol. There are multiple standards and there is no final word that has been said. He is right in that his observation is that it is not as if 10 or 11 means the end of the communication protocol. Protocols will continue to evolve as new technologies, new methodologies continue to evolve and what will be the purpose of the latest standards as opposed to the earlier standards? They will guarantee faster speed, they will guarantee better reach and they will guarantee less and less errors in transmission. So that's the purpose of the evolving standard. So in a nutshell then we must ensure that all packets reach. We must have an addressing mechanism. We must have a communication protocol and this is the generic standard name for all. So we shall study these things very quickly one by one. Now this is something that we must have on every computer on the network. First some hardware to connect to network. After all a computer has processor, memory, distance and so on but the computer has to connect to outside world. You need some hardware inside. That hardware is called network interface card or NIC. There is a standard terminology from the days when there used to be a physical card which will go on to the chassis of the computer along with the processor card, memory card, etc. Nowadays there is only one card that makes the whole computer. So all that you do is you put different chips which represent different portions of components of the processor. One of them is the network interface component chips but instead of calling them chips it still continues to be called network interface card. Of course you get external cards as well. This is the hardware. Then you require software and what is the software that is required? Remember we said that if two computers have to communicate with each other they have to follow certain protocol. Who implements that protocol? Who ensures that the steps of the protocol are correctly executed? Some piece of software. That piece of software is called the protocol suit. One of the most popular protocol suits, in fact I would say there is standard now across all computers across all worlds is a suit called TCP-IP. We shall see more about TCP-IP in a short while in this session. So this is the protocol which is the most popular. So you have hardware, you have a protocol, a software which will take care of the protocol implementation. Now you need to connect this hardware eventually to some similar hardware and protocol on his machine or any other machine. So you need some medium, medium which is physically connected where the physical signals will have to pass. These physical medium could take various forms, cables, radio link, telephone links, whatever. So wire, wireless, Wi-Fi or Wi-Max that he mentioned is effectively wireless communication. But if you have Ethernet cable which you typically connect to a switch, that's an example of a cable. You have copper cables extending to telephone exchanges which we use to push our digital traffic on that. And you have large optical fibres which go across Atlantic Pacific oceans at the ocean bed carrying such packets of information. So this is physical medium. Three things then, hardware on a node, software on a node and a physical medium. So every node will have hardware and software and physical media will connect to or more computers. We shall see how exactly these things happen. Remember the word TCPIP which is the most popular software protocol that we use and NIC is just a terminology. So which network interface card you use. So people will tend to use NIC as a word for that. So now we talk about networks. There are basically two types, what we call the broadcast networks and point-to-point networks. First we shall discuss the broadcast network. The broadcast network has a single communication channel shared by multiple machines. You would have the immediate comparison that comes to mind is a radio which is transmitting. That's why you call it broadcast. Or a TV station, power from which TV signals are broadcast. Broadcast, casting is giving to someone. Broadcasting is giving to multiple people. So broadcasting means signals are going out into the media and anybody who is connected to that media can tap into those things. So if I am broadcasting, you can listen to that, you can listen to that, you can listen to that, anybody can listen. We don't see the problem in this as long as there is one radio station or one TV station or something. But imagine there are 20 computers, there is a shared medium and each one of those 20 jokers is transmitting. Now there could be chaos. There could be signals emanating from multiple machines which are colliding with each other. In another workspace, it is very easy to see signals colliding and causing confusion. The easiest example here is imagine that in a classroom like this you are right now, I am broadcasting and you are all receiving. But you are receiving identical information. Now suppose we want to communicate between each other and suppose everyone starts broadcasting. So what will happen? Chaos. Because your signals are, suppose he wants to tell something to me but simultaneously he is trying to tell something to him, she is trying to say something to him and so on and therefore my ability to discern what is that he is telling me even to understand that he is trying to tell me something will get severely compromised. So if at all I am using this kind of network, one natural question to use, how can it work at all? Curiously it can work and curiously that is the standard where most local area networks operate in the world. We shall see how exactly they operate, how the confusion is avoided. But it is obvious that if I have a single communication channels and if packets are sent by one system with a destination address and handled only by addressy, ignored by other system, this last word is important. If he is sending packets to me because he wants to communicate with me since he is broadcasting, each one of you have access to that packet also. But your computer and protocol must neglect those packets because he is not sending them to you. Obviously how will you know? Because the packet will contain the destination address. If the packet contains the destination address of my computer then your computer will ignore this. You must still receive all of this. And that is the beauty why Allah does not happen in computer network is each computer is trying to transmit packets. We shall see how collision between packets is avoided and only the intended recipient will handle those packets. Others will ignore them. That is the principle of broadcast networks. So is this clear? You can actually address packets to all systems. In that case it is called a true broadcast like a radio station. Or they can be addressed to a subset of systems. I can send certain packets only to 10 systems. Another fellow can send it to a subset of 5 systems. It is then a subset of broadcasting. It is called multicasting. For example, the technology that we used prior to this session for remote centers where there is a signal satellite which was taking signals not only from our center but from multiple other people. Our signals were going to a set of 14 receivers only. So we were using a technology called IP multicast. That means IP packets all the images this recording not recording but the video signals and audio signals were converted into IP packets and they were being transmitted through IP multicast to a subset of the system where they were received and they were being used. So broadcast and multicast. And to distinguish that you call it a true broadcast because the entire network is broadcasting network any which way. The second type is point to point communication. This of course makes logical sense to us. So point to point network consists of multiple connections between pairs of systems. And this can be equated to your telephone connections. So when you pick up and dial a number you are given a logical connection not necessarily a physical connection to the dial number. The same cables may be carrying multiple signals but they are not confused with each other and the network is not broadcast type. So if he is talking to somebody else or other telephone pair as long as the network is what we call as non-blocking that means it has the capacity to identify different signals and carry them together then what is being established is a point to point network where any two points are talking to each other. Here the packets will travel to destination could be across several systems. For example if I am making a phone call from here to a friend in Kolkata then my packets will travel from here to my local exchange. From the local exchange they could travel to some major exchange in Mumbai. From Mumbai on some trunk line assembled along with several other packets it might go to Delhi. From Delhi it might go to Kolkata. From main exchange it may go to a local exchange and from there my packets will be separated out and they will travel to my destination friend. So point to point networks are very important from the perspective of carrying aggregated traffic and de-aggregating that traffic at any final point that you want. This is different from the notion of broadcasting. Clearly as I told you that most local area networks use broadcast type you will agree that most wide area networks will have to be point to point type. For example if imagine that if some packets are emanating in Mexico on some Mexican cuisine which has been requested on internet by somebody in Coimbatore there is no point in those packets getting transmitted all over the world to all millions of machines. It will cover up. So there you need a point to point connection and that point to point connection equivalent to a telephone connection will actually be set up by Michael. Consequently just as a thumb rule most wide area networks will deploy some kind of point to point technology whereas most local area networks we shall see how they will deploy some broadcasting network kind of technology. Clearly when the packets have to travel across several systems there may be multiple routes. For example my phone call instead of going to Calcuttavaya Delhi it might be routed via Nagpur. It might be routed via Chennai. If there are multiple routes the telephone service provider will decide which is the most optimum way to take my packets to the final destination. But if you have to do that you require spatial routing mechanisms. So you have a thing called a router which says these packets will go this way these will go this way whatever one. So this will have to be incorporated. Everything clear so far? Now we come to discuss local area networks where in a small room or in a building or even in a campus we are trying to set up a connection connectivity between multiple computers. We are clearly private you want networks so there is no multiple people unknown to each other trying to use this network. That's a fundamental requirement by the local area network. You cannot have a local area network spanning 20 people belonging to 20 different organizations where the medium itself which is a wire or wireless goes through two or three different service providers. So that is not possible for a local area network it is generally privately owned. It is typically within buildings or campuses and it is used to connect PCs to a departmental or a campus server. The broadcast technology is the most typical technology used in local area network. We shall see an example of the local area network using the bus or ring topologies. Topology is the physical mechanism of arranging the media across the computers connecting it and passing on messages. So what is the protocol? So a bus topology or a ring topology are the two most popular topologies for local area network. Here is a bus topology. In the bus topology there is a bus but there is a confusion here in computer lingo a bus is a pathway across which information flows. In human language a bus is actually a physical entity which carries people along. So you forget that bus but the road on which that bus rides is the bus in computer terminal. So here is the road. This is the medium actually and multiple PCs are connected to this. The ethernet is probably the most popular example of this kind of bus topology where the PCs talk to each other. The transmission control in this protocol which is IEEE 802.3 is a decentralized protocol. So let me go back to this previous slide and explain. Here is PC1, PC2, PC3, PC4, PCN. Let's say PC1 wants to communicate something to PC3. PC1 will start transmitting packet one packet at a time onto this bus. This packet will travel all across because a broadcast signal. It will reach PC2, PC3, PC4, PCN, whatever. Everybody will receive it. Most people will neglect it because it is addressed to only PC2 or PC3. PC3 will collect that packet, analyze it, use it, consume it. If PC3 wants to send some packet back such as for example acknowledgement that I have received this packet, it will transmit a packet here and this packet will again go all the way. Others will neglect it. PC1 will get it. If nobody else is transmitting except PC1 and PC3 and if they are taking turns to transmit you can see that there will be never any confusion. The communication will be absolutely undisturbed. Unfortunately, neither PC1 nor PC3 know in advance as to when some other computer for example PC2 might want to transmit to PCN. So how will you do that? PC2 does not know that somebody else is transmitting. Whenever PC2 feels right or is ready it has packetized a lot of information. It has prepared let's say 25 packets and it wants to send the first packet to PC9. Now there is a chance that actually that packet is transmitted and it is moving during all the traffic on that. But there is also a chance that when it transmits a packet exactly at the same time PC1 has also transmitted. The amount of time that the packets take to remain on the media is miniscule. So ordinarily even if multiple people are transmitting these transmissions should not clash with each other. However, there is no guarantee. It is a statistical thing. Imagine now that both PC2 and PC1 transmit simultaneously. What will happen? The packets will collide. One important thing that happens in such collision is that this collision can be detected by everyone including PC1 and PC2. How? Because when the packets collide there will be disturbance and they will lose the original character of the packet that has been so nicely defined. So that means everybody will receive some trash and everybody will know that this is not a packet and everybody will understand that this has happened because of a collision of two packets. The moment it is understood it is understood by PC1 also PC2 also. Imagine nobody else is transmitting at this time. Since PC1, PC2 both understand this. What they do is after some random wait time they will retransmit. Because the wait time is random it is unlikely that PC1 and PC2 now again will transmit exactly at the same time. But theoretically there could be a collision again. Again they will do the same thing. And the worst case they will get in their loop. Everybody is sending a packet, finding out that there is a collision detecting it and retransmit. That statistics is a beautiful science which works in large numbers. And therefore this technology which is called collision detection technology and transmission technology seems to work on Ethernet. This then is the principle that at any instance any one machine can transmit. If packets collide each machine retransmits after random time. And surprisingly you can achieve great speeds of faultless communication between different machines where packets move without any collision to the extent of 10 to 100 megabits per second. And recent technologies permit transmissions of gigabit per second kind of speeds. The modern channels are gigabit per second. This is possible because not only computers are becoming fast the packets stay for a very very little time on that broadcast network. Remember because it is a local area network things are happening at the speed of electricity which is the speed of light. And speed of light is much faster than mega kilo gigabit kind of thing and that is how things still work without any problem. However there is a limit on the number of nodes that can be connected like this without collision stops. And that is where you call it a local area network limited to certain number of nodes. But this is understood if packets collide people will retransmit after a certain random time which is internally generated. Ring topology on the other hand this was developed by IBM in the early days of networking ring topology IBM people did not like this notion of packets colliding with each other and having to do something statistical. They wanted a guaranteed communication. So they constructed a topology in which all nodes are connected through a ring. That means the packet is transmitted alright so any packet can go to anybody else. However they created a special packet. This special packet was called a token. It is like a beton in a relay race right. So the fellow is running is carrying a beton. The next fellow can run only when that fellow collects the beton from this person till then he cannot run. Similarly while they are on a broadcast network nobody is permitted to broadcast unless that somebody has the token. So this token is a special packet and this token actually moves around as a separate transmission from node to node. Consider that PC2 currently has the token because nobody else has that token packet because token packet is unique only PC2 is permitted to transmit. So PC2 will transmit a packet and then immediately pass the token to PC3. The packet might have been transmitted to PC1, PCN, they are not. When PC3 gets a token PC3 can transmit. Observe that there cannot be any collision in this network because at any time only one fellow is permitted to transmit. Nobody else can transmit. And that person is the person who holds the token. So this is called a token ring kind of protocol. This was perfected by IBM and when Ethernet people were able to give something like 2 megabits per second token ring people were able to achieve 4 megabytes. When Ethernet people were able to give 10 megabit per second IBM was able to give 16 megabit per second guaranteed for a certain number of nodes connected. However Ethernet could survive larger speeds and token ring was not adopted by anybody other than IBM. IBM persisted for some time with that technology and finally gave up and started using the same broadcast technology which is Ethernet technology. But you can see the simplicity and beauty of this protocol where there is no confusion ever. So is this clear how the token will move as a special packet and only the token holder can transmit. So there is no collision. So in this ring topology packets move around a ring. The transmission control is achieved through special mechanism. This protocol is called IBM token ring protocol IEEE 802.5 is the standard name. The standard still exists by the way. But nobody uses it. You see networking is a very funny thing. It does not matter which protocol I like. If I am the only one liking it, it is useless. Because if I have to communicate, others also have to like the protocol. So universal acceptance is an important criteria in networking technology. A token is passed a long ring. Token holder and only token holder transmits. It can operate at 4 and 16 Mbps. There is the old standards which IBM actually defy. Imagine you want to connect to some server which is on some local area network which is different from yours. How do you connect to local area networks then? For this reason you have to evolve a concept of networking the networks. Inter-networking therefore is the key word which permits to upgrade your networking technology from connecting two different networks. Please remember that when you connect the two different networks your ambition is that any node on one network can talk to any other node or any other network. Otherwise there is no point. You are not interconnecting only between the servers or something. You are interconnecting network. It is very obvious then that the ethernet technology may not work. Simply the numbers will be too many. The inter-networking technology then will use the notion of a router. The router will forward packets from one node on one network meant for a node on another network. And router is essentially a packet switching node. So if you want to connect one local area network with any other local area network or third local area network then you need a router. Consequently every isolated local area network must have a router mechanic. The router is as I said is a packet switcher. It is something like a networking node. It must also understand the protocols everything it must have additional capability of routing. You also get the regular routers as products hardware products which contain the protocol stack and routing mechanisms but you can simply take a PC load Linux on it and modern Linux operating system or even Microsoft operating system comes with suit which can convert that PC into a router. So it is not uncommon to find that in some departments local area networks are connected to each other through small Unix boxes which work as routers. After all router is nothing but a special machine whose only job in life is to connect to networks. That's all. They do rout it. You must of course have at least one per local area network. So your connection to anything outside your local area network is through that router. Is that understood? The router is the main thing here. This brings us to the notion of wide area network. Whether you are connecting two local area networks within the same campus or within the same building or you are connecting them across cities, it really does not matter. It's a question of speed that you can get, the cost that you will have to incur but you have to basically connect multiple networks through routers and that is what is shown here in the wide area network. You might have a network in Mumbai, a network in Nagpur, a network in Pune. Each one of them has a router. Each and these routers are somehow connected and to go back to our previous discussion, these routers will be connected by point to point networks. They will not be on broadcast. So somehow the broadcast kind of packets within a local area network will have to be routed using point to point technology. Packets will have to travel. That means the packet, packetization and your communication protocol at some higher level should be oblivious to the fact whether it is broadcasting medium or whether it is point to point connection, it should not matter. That will tell you that the so-called networking protocol itself must be built in layers, something which has physical connections, something which talks to application from one point to another, something which talks somewhere in intermediate level, etc. We shall see those things when we study the protocol stack. So is this clear? This is how a wide area network would be built. Inter-networking of local area networks at many locations is done through routers and routers are connected through point to point connections. Typically, if I am going outside the campus particularly when the ownership ends, local area network is owned by me. Multiple local area networks could also be owned by me, in which case the networking routers will also be owned by me. Within a campus that is the situation that prevails. At the moment I have to go out, I have to cross municipal boundaries, I have to use service provider's telephone lines whether it is Tata phone or MTNL or BSNL, whatever, and then I will use either list or dialogue lines going through what is known as public switch telephone network or public switch digital network. So PST and PSD and all these are called public networks because anybody can subscribe to that. Many of you would have ASDL modems at home to which you connect your computer. So you get the internet service through BSNL, MTNL or whatever. All that is happening is your computer is being connected to a router to some other router at the back end. Notice that you do not have a local area network in your home but actually what you have is a local area network consisting of only one computer. In fact if you played a fool with that router which the telephone department gives you you can actually also create a network of two or three computers on the same thing and all of them can be simultaneously connected to the back end. Because what you have essentially is a local area network connected through this router to the back end. In short, wide area network is nothing but a network of networks. Wide area network is never ever a network of nodes because wide area network consists of connecting routers to each other through point to point network and routers in turn handle the communication between any one of the nodes on the local area network of that router to any other node on any other network. Now we turn our attention to the protocol that each node must follow in order to communicate with any other node. This protocol as I told you must be able to handle things at the level of physical media, things at the level of application, things at the level of packetizing, etc. Consequently it makes sense to consider the software for this protocol to be comprising of multiple layers. There is a great advantage in simplicity and modeling and understanding of this protocol. There is also a great advantage that one layer I may be able to change if I maintain the interface between that layer and the layer at the top layer at the bottom to be same then I can replace that entire layer by any other software without affecting the working of the protocol. So early days of networking this layering concept emerged since networking software has to perform a number of functions you reduce layering to define isolated boundaries for each function. Each layer implements some services through its own protocol suit. So there is a networking protocol suit which comprises of multiple pieces of software or routines or whatever and there are some pieces of programs which will constitute a protocol suit for this layer, another layer, etc. The totality is the networking protocol suit. The service that is offered by the protocol suit can be connection less or connection oriented. I shall explain the two words, they are very crucial words in networking so you must understand the implication of both. We shall spend some more time in understanding the connection oriented or connection less protocol at a later point. First of all if I have a layered model imagine there are 1, 2, 3, 4, 5, 6, 10, 20 layers in my protocol suit. Then at a conceptual level I decide that the layering will be such that one layer on one node will effectively talk to or communicate with the corresponding layer of the other node. So 15th layer here we will talk to the 15th layer there. First layer here we will talk to the first layer there. That will permit me to have identical functionality in these layers so that they understand each other. Within a node each layer will of course have to interact with a layer above it and a layer below it otherwise there will be no communication. Layers are not isolated completely. There is a single protocol stack which achieves end to end communication and I have layered it for my convenience. So let's say I have a 5th layer on top of which is 6th and below which is 4th. Then this layer must communicate to the 4th layer in order to accomplish something and in turn must communicate with the 6th layer to receive some work from the 6th layer. Assumption is that the work definition starts at the highest layer and implementation of communication happens at the bottom most layer. All layers are merely isolated in functionality. Now there are two important models for this layered protocol notion. One model is called the ISO model. ISO is International Standards Organization. It defined a model which was known as Open System Interconnect or OSI. So ISO OSI was one model. You may not bother because it's not worth remembering. This model came out as the cleanest ever model. Whole lot of academicians worked on it. A few industries worked on it but ultimately all of them gave it up between another model called the TCP model which is known as the Transmission Control Protocol. We shall see what TCP is. This model was preferred but effectively it was the ISO OSI effort which led to the establishment that a clean layered model is the best model for representing protocol suits. And as we shall see TCP is not anything completely new or different. It merely combines some of the layers of ISO OSI model and reduces the total number of layers that you have. However, the TCP model is the one which is most prevalent. So this is about the layered model. The Open System Interconnect or OSI model of International Standards Organization was a seven layered model. It is interesting and important to remember the names of these layers because they indicate the functionality that has been isolated in that layer. At the bottom most level are physical layer. Makes sense because physical layer means copper wire, wireless, whatever. Media, things which deal with media. Immediately above it is the data link layer where ultimately I have to transmit bits and my medium may be analog or maybe of a different kind so I have to do some contours there that's called the data link layer. On top of it is the network layer. Network layer is the one where you will understand the node addresses and you will understand the packets and so on. On top of it is the transport layer. The transport layer is responsible for ensuring connection. Network layer merely will take the packets across network. So the transport layer which is the controlling layer will ensure communication effective communication between different nodes. The top three layers, the session layer, presentation layer and application layer all refer to activities that are happening on any one node. So if I am sitting on my computer and let's say I have logged in then I am working on a session. Let's say a secure session where I am running some queries. So that is called one session. I will be doing some presentation. That means I would like certain graphics or text to appear in a certain way on the connected node. Then I will deal with that using presentation. And all of these I am doing under some application. So I am just running a regular reservation application or I am running an email application. So consequently the OSI people decided that there should be three different layers called application, presentation and session which will all together define the word that is happening at a node which is essentially either generating information to be exchanged or receiving information that has been sent by someone. The meaningful interpretation of that information at the receiving node and the meaningful compilation of that information which is to be transmitted is the job of all the three layers at the top. This is in a nutshell how logically you should view the seven layer model. Imagine that there is one node on the left-hand side and the other node on the right-hand side. So imagine that all of this, these all the ways, by the way is all software. So the networking protocol is software stack. When I say application, the application may be an SQL engine that I am running. That is not software stack. But that portion of the communication which is related to application level activity will form part of this stack. So imagine that all these communication components are stacked together like this on one node. They are all stacked together on the other node. And they are all physically connected to a physical medium. And as we noticed earlier, if these nodes are across multiple wide area networks then there may not be even a single physical medium. Some distance the thing will travel through Ethernet cable. Some distance will travel through telephone lines. Some distance will travel through optical fiber. Somewhere else it may get converted into wire max or wireless protocol whatever. Somewhere it may go across to satellite. It doesn't matter. Physical medium is what we represent all that means. What is connected to the physical medium is the physical layer which is not shown here. That's the layer which connects this to the physical medium on both sides. The next is data link layer, network layer, transport layer, session layer, presentation layer, application. And what is most important to remember is that this network layer logically talks to this network layer. That means the packetization it will do or addressing it will do will be something exactly identical to the packetization that this will understand addressing that it will understand. Protocol that understand each other. That is why it's the same layer. It is the scale how the networking works at the internals. Physical layer is responsible for transmission of raw bits. It recognizes 0 and 1 because voltage levels will typically decide. So there is a threshold voltage level. Below it is one value, above it is another value. And the value depending upon positive or negative logic, lower value may be 1 or 0. Upper value may be 0 or 1. Then there is a media. So there are mechanical and electrical issues. Typical media is copper. It could be a twisted pair of coaxial cable. I will not go into the details of this. The idea is to reduce noise, the signal noise and ensure that the packets travel a larger distance. Optical fiber, the packets can travel even larger distance because you are converting signals into optical signals. They have very large bandwidths. Gigabit per second and they have low attenuation. That means a signal can travel as much as 30 kilometers on what is known as single mode fiber. If you have multi mode fiber, the distance may be less. In any case, when you avoid area network, the signals get attenuated. So that means after sometime you cannot recognize that signal. So you need to have a repeater which will repeat the signal. This is amplified. It will amplify the signal and again send it. Don't forget that this large cables that we saw, there is a news that the cable got burst under the seabed and there is a disruption of internet traffic. So often it is not the cable which gets into problem. There are large number of repeaters which are also to be laid. And remember, the repeaters have to continuously amplify. That means they require some power. Where do you get that power from? It is fed through the same cable. And if one repeater fails, you have a huge problem. So that is a different kind of engineering altogether. I am just mentioning it to indicate that the physical layer is not everything hankidori or simple. It has its own engineering issues. But they don't concern us at the protocol layer. The data link layer, it actually takes charge of the raw transmission. So it has transmission, reception, facility. It can transmit bits. It can receive bits. And this can ensure error-free communication between bits. So it will send data frames, receive and process acknowledgement frames. If there is a protocol which does not require acknowledgement, then it will only transmit. And it will only receive. This is what controls access to the shared channel as in broadcast mechanism. It is an important layer. But again, we ordinarily do not come across it at all. So we can neglect it just to note the existence of the data link layer. This is the most important layer, the network layer. This controls the operation of routing of packets across networks. And it does address resolution. What is address resolution? Remember I said every node has to have an address. But there is a destination address. There is a start address. In a local area network, I know all the computers are here only. But if there is an address which is Kolkata address, how will I know it's a Kolkata address or Coimtool address? Secondly, it will go through the routers. Routers themselves are nodes only. So routers also will have an address. So there is a huge address resolution problem which is done through tables of addresses. These are either static tables or dynamic tables. This layer also handles network congestion. And it provides for usage accounting little normal rules but important. We shall see something in the network layer later. Basically I am just giving you the explanation of what each layer does. Then we will immediately go over to the TCP which is the most popular thing and see what happens there. The transport layer, this is the middle one. Between the application which is session, this, that, that and network layer. So this transport layer takes the message from the higher level. Remember I said you might want to transfer a file. It could be a few bytes. It could be a few gigabytes. This file originates in your application layer somewhere, session, application, presentation layer. Now this file has to be transferred. So the transport layer takes control of this message and it converts it into packets. Similarly at the other end the transport layer will receive packets from the network layer and reassemble them in the right sequence. So the transport layer is the one which is the intermediary between your message, your, I would say your message or as you may call it and receiving that payload in the correct sequence. So this is a true end-to-end layer. You know that all protocols are merely to ensure that Jing Zhang happens and the communication takes place well. But the message is coming from your application into this layer and then this is the layer which ensures that you get your message correctly or transmit your message correctly. This is trivial. Session layer establishes sessions to provide dialogue services, synchronization, etc. and it reserves common representation issue. You might be transmitting ASCII coded text but the machine at the other end understands only upsetting. Somebody has to convert this. So all these things, text coding, etc. are taken care of by the session layer and presentation. The application layer again has several standards depending on what kind of application that you have. For example, virtual terminal is an application. That means I have a PC, I have a keyboard monitor and that keyboard and monitor is controlled by the operating system of this PC. But imagine that I want to make this computer effectively a terminal to some mainframe. I don't want any other functionality here. Now that mainframe might be transmitting and receiving packets. So my communication protocol up to the transport layer works on getting these packets or sending these packets but the application layer should make my computer as if it's only a terminal, nothing else. So I should get a prompt of the mainframe when I type something these characters should go there when the computer responds those characters should come back. So this is called a virtual terminal application. It was one of the earliest applications. Email everybody is familiar with. So there is an email application. You can look that mail application. It will work to collect email from you, send it, receive transmitted emails, show it to you in variety of functionality. All of that will happen at the top. Everybody is familiar with FTP or file transfer protocol. This is typically used to transfer interesting pieces of music stored on a friend's computer to your computer. This is the most common use of FTP that I have seen apart from of course using it to download some useful software once in a while. Directory lookup is another application we shall have more to talk about on the directory lookup at a later stage. But these are just examples and you can think of n number of applications there. But why I listed these applications is these are now considered standardized applications and the protocols or frameworks for these standardized applications are part of the standard protocol soon. In order to get that separately, I have to do anything separately. So this is the ISO reference model. As I promised, we will now revert our attention to TCP IP. Actually I will give the name as TCP, but the full name is TCP IP. So this is transmission control protocol, internet protocol. IP stands for internet protocol. TCP IP reference model. While a lot of people were working on the ISO model because the networking motion itself was being developed. Remember there was no networking before that time. I am talking about middle 60s when the only thing that you could have interactive terminals had just come in and those were connected to a mainframe directly, physically into a place where you had. The only networking that was becoming possible is you could locate a network, connect it through a modem and through what is known as a serial link or RS232 interface. Those of you who are studying electrical engineering or others might know RS232 is a serial link interface which connects two different disparate things. That was the only networking available earlier to connect a terminal to a mainframe. There is no notion of networking between two independent computers. So the first to evolve was the ISO model. Simultaneously, ARPA-Net was evolving. This is in short, ARPA by the way is a department of defense agency called Advanced Research Project Agency. That was the name given. And it set up a network using four computers in 1969 which was called ARPA-Net. That was the first experiment in the world to connect four independent intelligent computers to each other using a network. This itself has evolved into internet today. So what we call internet, the beginning were in the ARPA-Net and this work was done by the Department of Defense, the US Defense Agency. The internet itself for the first time was defined by self and Khan. 1974 is the first definition of internet. So when we talk about internet, it is actually a network of networks. And internet working is the full term. And that is how the internet arrived on the world. Since internet used around this time, used the TCP-IP protocol suit which was actually being implemented and tried at Berkeley University. And Berkeley released that internet suit. It is also simultaneously releasing the UNIX versions through what was known as Berkeley Software Distribution or BSD. BSD was a very popular operating system distribution. After UNIX was generated in Delhi as AT&T, it was sort of open sourced and Berkeley started distributing it. That was okay as far as operating system is concerned. But because most of these people, not just in academia but even in the industry, wanted to connect computers to each other, they would load UNIX because UNIX which was distributed by Berkeley contained the TCP-IP protocol. They did not wait for any standardization. They released this only version of the protocol. TCP-IP therefore has been part of UNIX from the very early days of UNIX operating system. And because that protocol exists there, everybody started connecting to other computers using dialogue connections, whatever, whatever. And that is how this model flourished. So before the ISO community even could take stock of how mature their own model is, they found the rest of the world was using TCP-IP model de facto. So while the ISO model is academically neat, TCP-IP is the de facto standard. Now nobody talks of ISO model except that it is defined in the textbooks as a clean model. TCP-IP is the implemented protocol suit. And this is the reference model for it. You can see HTTP, FTP, Telnet. Telnet you know is remote terminal basically, emulation. FTP everybody knows file transfer protocol. HTTP is hypertext transfer protocol which we shall see in the subsequent session. All of these constitute application layer in TCP-IP reference model. TCP is the transport layer. Transmission control protocol it is called TCP. That is the transport layer. And IP or the internet protocol or internet working protocol is the network layer. Below that there is a data link layer, there is a physical layer. So consequently the TCP reference model has a physical layer, a data link layer, an IP layer, a TCP layer and all other things on the top of the application. Instead of 7 layers it has 5 layers one. Out of these 5 layers the application layer constitutes the standardized applications we need not worry about them except to use them. The data link layer and the physical layer again we need not worry about them because we don't care how those things work. The TCP layer and IP layer are the most important layers. Again while we need not know exactly how the algorithms are implemented but from our perspective of effectively communicating information across nodes it is important for us to understand the basics of both TCP and IP which is what we will do shortly in this session. So is this clear? This is the model now that we use now. And one is the transport layer, the other is network layer. So first the internet or the network layer or the IP layer. It is a connectionless packaged switched layer. We will see what connectionless is. Packet formats are defined in this layer. What is the packet format? Each packet is called an IP packet. The maximum length of IP packet is 65535. Any idea what this number comes from? 64 kb. Usually it is about 1500 bytes but that's a thumb rule. Each packet can be as big as this. Each packet can be as small as, can't be 2 bytes. We shall see why because there is a definition of the IP packet and the minimum thing in that definition must exist in that packet. The rest of it is called the payload or the content that you want to transfer. So effectively IP layer is responsible for taking a packet and pushing it across or receiving a packet and pushing it up. So it will receive packets from the top which will forward it to the below layers and therefore talk to the corresponding IP layer. This layer controls the operation of routing of packets across networks. It handles network congestion and it does usage accounting. Limited functionality from our perspective. Most important is it controls the operation of routing of packets across networks. We talked about this earlier. In any network, first of all I should be able to uniquely address a node. If IP is my basic packet technology where packets are going to go or being received then each packet must know where it is being sent or who has sent it. Consequently I should be able to identify uniquely the originator of the packet and the recipient of the packet. This is handled through IP addressing. So IP packet have a unique addressing scheme which is now described in the next few slides. Each IP address is a 32-bit address. Here is a class A address. A address is recognized as class A when it begins with zero. You can see there are four bytes. One, two, three, four. They constitute 32 bits, right? Out of these the first byte is utilized to give a 7-bit network address and the 24-bit host address. First bit if it is zero then the whole address is recognized as class A address. What does this mean? How many computers I can address across how many networks? This is zero so only seven bits remain. That means two to the power seven which is about 128 different networks that I can access using. If I have geographically arranged these networks at the highest level I may say India is one network, US is another network, Europe is third network, Africa is fourth network. I can go to as many as two to the power seven different networks. Within each network I am capable of addressing 24-bit hosts. How many host computers? These are called hosts because they are slightly better than just an ordinary node. It is very obvious. It should be very obvious to you that you and I cannot be sitting on a machine which has class A address. The class A address therefore is meant for some very high level stuff for internet work. So two to the power seven networks and within each network up to two to the power 24 hosts. Two to the power 24 is also very large number. So I can have 16 million nodes at the highest level which can be accessed directly by a class A address. And I can address in each of the two to the power seven networks up to 16 million nodes. So it is quite likely that one of the hosts in IIT Bombay could be one of those 16 million nodes in India. But that node will not be something that you and I will be using. That will be the main networking router may be or main component which handles the entire networking. Or it could be Mumbai sitting whatever so higher level. That is why it is called class A address. Is that clear? So any 32-bit combination which begins with zero in the address field it will automatically mean it is a class A address and its interpretation is like this. The next level is class B address. Class B address always begins with one zero. First bit is one, second bit is zero. If that is so then this entire address is taken to be a class B address. This class B address reserves 14 bits for networks and 16 bits for host address. How many networks are possible then? Two to the power 14. Within each network how many hosts can come here? Two to the power 16. Two to the power 16 is roughly 65,000 whatever. So again I have 65,000 hosts in each of the two to the power 14 networks. This is still called class B which is the next layer. There is a third address class called class C. It begins with one, one, zero. That means the first three bits is one, one, zero. If it is one, one, zero then it uses 21 bit for network address and 8 bit for host. 8 bit will mean how many machines? 256. Clearly this is where I am likely to find my machine. So my machine will be connected to a network and my machine will have an IP address which will most value a class C address. If it is exposed to internet then my machine will have a class C address. And this class C address will give me a unique number amongst 256 numbers but all the 256 nodes can be on any one of two to the power 21 networks which is a very large number. Class D we need not really bother. It is actually used to put multicast addresses. When I want to multicast a large number of things I will have one, one, one, zero and the rest of it will be multicast address. We do not bother with class D. We do not bother with class C which is reserved for future use. The future for this entire addressing scheme has already become fast because this scheme is no more valid. Why? The total number of computers in the world today are more than what you can uniquely address through this scheme. Remember when in 69, 70, 71, 72, 73, 74 when the scheme was evolving everybody thought that 32-bit address is a huge thing. Don't forget that the first network was built connecting just four computers. The total number of computers in the world were very limited and people genuinely thought that in the entire lifetime of networking 32-bit is a huge number. You will never require an address more than that. Internet has undergone another version where the new version called IPv6 or version 6 now uses 64-bits for addressing. The old protocols which work in your and my computer ordinarily cannot understand this IPv6. But when the protocol is upgraded to understand when the network cars are upgraded to handle 64-bit addresses etc. the whole world will migrate to 64-bit kind of marketing. 64-bit addressing incidentally supposed to cover up to 3 computers per square meter of known earth space not counting oceans. So if you have 3 computers per square meter of the known surface of the earth you will still be able to handle those many computers. All of you think it is large number, right? Except while the IPv6 is coming up we already have problems because a single individual who carries gadgets which are not necessarily conventional computers these gadgets are becoming strong enough to demand an IP address. I might have a watch which is connected on IP with another digital watch somewhere. I have a smart card. I might have my ring which is an IP device maybe. And there might be an embedded chip in my hand which I may be using to shake hands with you and collect money automatically from your bank account to my bank account. So a single individual in a mobile fashion may be carrying 4 or 5 computers. At home I already have computers in my television in my refrigerator practically every gadget that you can think of. And who knows in future while let's say travelling from airport to home I might want to order my let's say I can't order my refrigerator for doing anything useful but let's say I want to order my microwave oven in which let's say my wife has kept my tea to heat it up so that the moment I enter the home I will get a hot cup of tea. Exactly 60 seconds is what is required. Exactly 60 seconds before I enter I will type it. I might want to do fancy stuff. You can think of many more useful applications. This is a dumb application. But if I want to do that I want to be in communication maybe using my smart car to a microwave oven in my house and every other person might want to do the same thing. Will we live with 64-bit addresses also or not is not known. I think we should take a logical break at this stage.