 So we compare peer-to-peer systems against client-server systems. We're a client-server system, as you know. We have a server, a centralized server, and the clients use that server to access resources. Whereas in a peer-to-peer system, you can think the client, the end-user computers, share their resources amongst each other. What are some of the practical benefits of storing amongst the clients, as opposed to relying on a central server? We can distribute storage across the clients, across the peers. So for example, if we're talking about storing files, in a client-server system, the server stores all the files. The clients access those files from the server. So the storage is all done at the server. In a peer-to-peer system, we can think that the peers, the end-user computers, store the files. So the files are distributed amongst all the peers. So you have generally much larger storage space available. Because a single server, if that does all the storage, versus hundreds of thousands of clients which are providing the storage, then those hundreds of thousands of clients, the sum of the storage can be much larger than a single server. So by distributing storage of files across multiple computers, we can get more storage space. In a similar way that we can, instead of relying on transferring files from an individual server where we have performance bottlenecks, that is, we rely on the bandwidth going to that server that will impact upon our data transfer performance. If information is shared on multiple different peers, then the rate at which we can transfer that depends upon the bandwidth available at multiple different peers, not on a single server. So instead of having one bottleneck in our network, because we can access resources at multiple different locations, we can avoid such a bottleneck. An example is that instead of having to download from one server and thousands of people downloading from one server, those thousands of people can download from multiple different peers, so distributing the load across the network. And another thing that peer-to-peer systems can allow in some cases is using the knowledge from the different peers to help solve some problem. One example, and we will not go into any more detail, is think of a social network or having end users to be able to perform some classification. So in a client-server system, there's some central entity that, say, classifies information, sorts out information. In a peer-to-peer system, each of the users has a role in classifying that information. It's potentially you can classify the information much better if you rely on many more users. Something like Wikipedia. The idea is that many different users are contributing information to this system. As opposed to a centralized system where we have just one source that's contributing information, we can have potentially much larger amount of information if we have all peers involved. We're going to focus mainly on storage and some aspects of bandwidth. What we want to do is look at some general mechanisms for which a peer-to-peer system can be built. There are different ways to classify peer-to-peer systems. There are peer-to-peer applications, software you download and install, peer-to-peer protocols, protocols for exchanging packets to achieve some peer-to-peer system to work, so to support some peer-to-peer applications, and algorithms. So there's a mix. When someone says a peer-to-peer system, it may refer to any of those things. We're going to focus on some general protocol mechanisms. So some examples of peer-to-peer applications you may have heard of. Nutella, Napster, BitTorrent is another one. There are peer-to-peer platforms or software architectures that allow you to develop applications. There are different algorithms that are used for searching in peer-to-peer systems. We're going to look at some of those general algorithms or protocols that a peer-to-peer system can be built upon. Go back to a client service system. You want to download a file using a client service system. The server stores all the files. You want to download... The server stores a list of a thousand different files. You as a client want to download one of those files. To find that file, all you have to do is contact the server and say, do you have this file? And if the server has the file, you can download from that server. Finding a resource in a client's service system is very easy because the server has those resources. So to find that resource, the file is the resource, you simply contact the server. In a peer-to-peer system, we distribute the resources, for example, the files, amongst many different computers. So there's an additional challenge of how do you find those resources? So resource location is a main challenge in peer-to-peer systems. So we'll generally talk about a resource. The best example is a file, but it doesn't have to be a file. The resource can be some compute capability, some capability of computer to perform some operations. In a centralized network, to find a resource, you go direct to the server. But in a peer-to-peer network, those resources are spread around different computers. So how do you find it? Let's introduce some notation. And we'll use that to describe some different algorithms for finding resources. In a peer-to-peer system, we can say we have a group of peers. Think of them as computers to spread across the internet, for example. A group of peers, G is the group. Each peer has some address. In the internet, that address may be an IP address, maybe a port number for the application. So each peer can be identified. So we'll use the notation P. We'll use an underscore to indicate the peer ID. And those peers store resources. For example, think of resources being files. And we can identify those resources by some unique key. You may think initially of it as a file name. So we have a set of computers in the internet. They want to share files. Then we need some unique identifier for those files. A file name may be a unique identifier. But in fact, you can have two different files with the same file name. So in fact, file names are not unique in many cases. Another way to get a unique identifier for a file is to take a hash value of the file. We'll talk about that if we need it later. But assume that each resource can have a unique key and identifier. So we have peers, resources and keys. Peers store resources. Resources are identified by keys. So the goal for resource location is you're looking for a particular resource. Assuming you know the key for that resource, then the challenge becomes if you know the key for some resource, you need to find the peer that stores that resource. Remember, in a peer-to-peer system, when we're distributing resources amongst peers, the resources are not on an individual server. They could be on one of many different peers. Let's give a simple example. Simple peer-to-peer system, four peers. P1, P2, P3, P4. Four computers on the Internet. I call them P1 to P4. They could be identified by some IP address, some port number. But four computers anywhere on the Internet, not in the same LAN, anywhere on the Internet. They store some files. And instead of writing the file names, I'll just say resource R1, some file. P4 has some files. R1 and R3. P3 stores some files. R1, R4. P1 has R2. And P2 has R3. So here we have four peers. Each of them stores some resources. Each resource has a key. And the notation we'll use is that resource R1 has a corresponding key, K1. R3, K3 and so on, same subscript. Now, P1 is, wants to find a resource. They have some key, K3, where the key is a unique identifier for that resource they want. They have a key, and now their challenge is find out which peer in this network has that resource. So P1 is looking for the resource identified by key, K3. So resource location is, we need some algorithms such that P1 can find out that in this case, either P2 or P4 have the resource. That's the objective of resource location. Find which peers have that resource. Once you find them, then you can contact them and download that resource or access that resource. Now, if we think of the resources of files and the keys to keep it simple file names, then you can think, okay, P1 wants to find some file, K3, they know the file name, they want to download this file, therefore they need to find the peers that store that file, and we'll go through different approaches for doing that. So that's the resource location problem. And it involves using some form of an index, where an index maps keys to peers. An index of this system could be K1 is available at P4 and P3. K2, which means really resource R2, is at P1. K3, meaning resource R3, is at P4 and P2. And do we have another one? K4, P3, P3. That would be an example of an index in this case. It tells us where those resources are located based upon the key. So, if we know this index, then when P1 is searching for the resource identified by K3, then it's quite simple. Look up the index. K3, look, K3 is at P4 and P2. So if we know the index, resource location is easy. But the problem is, how do we know this index? How do we know that K3 is at P4 and P2? And where do we store this index? In a peer-to-peer system, one of our potential advantages is we distribute the functionality across many different computers. We don't have a central server that stores this index, at least in some systems. So if we want to distribute this, then we may store some of the index on different peers. We may store the entire index duplicated on each peer, or in fact, store no index information on any peers. We'll see some different approaches. So our goal is to be able to locate resources. We've just introduced some terminology at the moment. Normally, a peer will not store the entire index because in our network, it's not a problem. But now imagine a network with 1,000 different peers spread across the Internet. The index will change over time. The set of resources available will change over time. A new file becomes available. A new peer becomes available. The index changes. Trying to update the index on 1,000 different locations so that it's a consistent index is very challenging. So often a peer will not store the entire index, the portion of the index. And since they do not store the entire index, they may have to send a request to locate resources from their neighbors. So if P1 did not have this index, it may have to send a request to other peers saying, where is K3? If they store the index, then they can send the answer, K3 is at P2 and P4. So the index may be distributed amongst different peers. For this to work, we need some way to allow peers to enter the network and to leave the network, to join and leave a group. That is to know about the other peers in the network. For example, if P1 is going to send a request to another peer, which one? Again, think of these four computers as four random computers in the Internet. They're not in your same LAN, they're four computers in the Internet. For you to join this peer-to-peer network, you need to know some of those existing peers in the network. So there needs to be some way to join a group. And once you join some peer-to-peer system, join the group, you need a way to be able to search, given a key, find a resource, or find the peer storing that resource, and also insert new resources and delete old resources. That is, if resource R3 is no longer available at P2, then from the network we need to delete that resource. Somehow update the index so that this entry disappears. So think we have resources, we need to be able to find them, add new resources, remove old resources, and manage the group. How do we do those things? We can manage the set of peers in a group. Well, there are four peers in our peer-to-peer network. New node comes along. P5. It wants to join the group. It wants to join this network. What does it do? It needs to know an existing member of the group. To join this group, it needs to know the address of one of the existing four members. How does it know that? Well, either the addresses of these are published somewhere. We'll see some specific approaches. There's maybe some other special server that keeps track of those peers in the network. But somehow it needs to know the address of at least one other peer. And then once it knows one other peer, it sends a generic message to that peer saying, I want to join the network. And assuming that's accepted, then that new peer exchanges information with existing peers. So P5 may send a message to P2 saying, I want to join the network. If that's acceptable, then P2 may tell P5 about some of the other peers in the network. About the peers that it knows about. So as a node wants to join the peer-to-peer network, it needs to learn about other peers in the network. Similarly, if a node wants to leave a group, it normally will tell the other another peer, I'm leaving the group. And those other peers can update, for example, their index information or other information to keep track of who's in the group and who's not. The main three operations for data is search, insert and delete. Search is think of we take some input key. It should return the address or the set of peers that have that resource corresponding to that key. So search, if P1 performs a search for K3, that should return what values. If P1 searches for K3, it should return P4 in this case. Because P4 stores the resource R3 which is identified by K3, key K3. The difference between the resource and the key is the resource is the actual, say, file. The key is just a short identifier of that file. So that's the idea of search. We need some algorithm to do that in the network, some algorithm and protocol. Insert is given a new resource and its corresponding key add that resource to the peer-to-peer network. And depending upon the system that can be done in different ways. We'll see that when we go through different examples. And delete is remove some resource with a particular key from the network. How do we implement these functions? Different approaches for doing it. We're going to go through, I think, three basic approaches. We're going to see how search insert and delete make or how they work in different approaches. The different approaches can be classified in different ways. One of them is called, is whether the approach is structured or unstructured. In an unstructured approach there's no information about the resources kept on other peers. In an unstructured approach peers do not have more index information. In a structured approach some peers store information about what other peers store. So if this index was stored on all five peers, which approach do we have? Structured or unstructured? Common exam question. The index that I've drawn on the board, if it's stored on all five peers is this a structured or unstructured approach? It's structured because there's some peers storing information about other peers. So the difference is that a structured approach peer, for example, peer two stores some information about what peer one, three, four and five have. So if we store this index on P2 P2 knows something about the other peers. That's a structured approach. Unstructured is when a peer knows nothing about what other peers store. We'll see some examples of that. Flat versus hierarchical. A flat peer-to-peer system all nodes are the same. Hierarchical some nodes have special functionality. Let's go through three three different approaches and see this in play. We'll go through Napster Nutella and another one called FastTrack. On these slides there's a little bit of history of where they come from. They were three originally popular peer-to-peer file sharing protocols. So Napster was probably the most popular one or the first most first popular peer-to-peer file sharing application. We're not going to talk too much about how it was used. We're going to focus on the technology behind it and what the protocol was or the basic architecture for discovering resources in the network. Same with the others. We will not go on so much about how they used but go direct to how they worked and then compare them after. And I'm going to do it on the board. We don't have any pictures here. We're going to go through the examples on the board. Napster was a directory-based architecture. It's a peer-to-peer system in some senses but in other cases it's considered a client-server system. So it's a mix in fact. How it works is that there's some central server which is used to locate resources in particular to store an index. The clients that want to find resources contact the central server and then once they find the address of the peers that store the resources, then the clients contact those other peers directly. So think in the context of downloading a file. In a pure client-server system you contact the server and download the file from that server. In Napster what we have is a server, some clients I'll call them peers. So we have a special server and we have our set of clients or peers in this case, the end-user computers. And let's say these peers again have resources. I don't know if it's the same as over there but we'll make up some values. So these are computers. They have a set of resources. We want to make those resources in the network. How do we do it with Napster? Well essentially the server stores an index of these resources. So what happens is that a peer when it wants to join the network contacts this server so it needs to know the address of the server. So that's a well-known address an IP address, a port number for example. The peer contacts the server with a join message. We want to join the network and the peer tells the server its own address its peer identifier and the resources that it has. And the server stores an index. And as we said it stores for a particular key the location where that resource is at. So K2 P1 That is the resource and the key K2 is stored at P1. So when P1 joined the network it would contact the server and join and inform so I would not draw it but would send a join message I want to join the network and then send an insert message maybe do not draw it or make some space because I'll delete it in a moment but insert and what we saw with insert we have a new resource R2 identified by key K2 this is really the peer telling the server saying I have resource R2 identified by key K2 and it's stored at P1. Therefore the server updates its index saying K2 is available via P1. So there's some message sent from peer to server informing of these resources available and each peer that joins the network does the same they join by contacting the server and then inform the server which resources they have available and over time as peers join the network the server index is updated so after these four peers have joined we'd have K4 is at P2 KE2 P3 K3 K1 at P4 so each peer informs the server of the resources as available and the server maintains an index this is not peer to peer this is like a normal client server system think of a normal server where you download files from that server stores an index of the files it stores the file name except in the index of a pure client server system the file is actually stored on the server so where Napster is a peer to peer system is that the resources the files are not stored on the server they're stored on the peers so a peer joins by contacting the server inserts their resources by telling the server what resources they have the server maintains an index now a peer comes along and they want P4 wants to find a resource so what they do is they send a query to the server a search query searching for K4 P4 wants to find the resource identified by K4 think of K4 as a file name or a unique identifier of a file and they send a query to the server and quite simply the server looks up in the index K4 look at the index K4 is available at P2 server sends back a response some answer it doesn't matter whether it's called answer what the type of message is called the idea is that it tells the peer that that resource is available at P2 P4 contacts P2 some request and if this resource is a file they download the file so that protocol for contacting and downloading the file depends upon what the application implements it could be simply a HTTP request and response some file transfer protocol so once you learn the address of the peer that has the resource then you contact that peer and access the resource if it's a file you download it if the resource is the ability to do some computations then you access that peer and tell it to do some computations so the peer part here is the distribution of the files among the peers and the contact directly of the files the server just stores the index it doesn't store the actual files so we join the network by telling the server or contacting the server we insert resources by contacting the server similar we delete resources if P1 no longer has R2 it sends a delete message to the server the server simply updates the index and to search we send a query to the server and it responds with the peer that has that resource and then we contact that peer directly so that's the basics of Napster and Napster in its original form was very successful because it was used for end users to download files, to share files so the end users have files on their computers, on their hard drive the Napster server keeps an index which end users have which files so to find another file you contact the Napster server and that will tell you where to download the file from but because most of the files were copyrighted it was illegal to do so to provide at least that service so that's where Napster was shut down because providing the server to locate the copyrighted material was deemed illegal in that case so this is a mix between a peer-to-peer system in that we're accessing the peers directly to access the files but we still rely on a central server for the index if the server fails the entire system will not work because we can no longer search so that's a problem in this case we say it's a directory base the directory this server stores a directory of the files or the resources and index it's very efficient for searching to find a resource we send one query to the server and immediately get an answer because the server has that entire index so very efficient for searching but it's not so good for fault tolerance, what does fault tolerance mean the ability to handle something going wrong if the server fails the entire system fails the system will not tolerate a fault at the server if p2 fails the computer is not working then the system will still work it's just that the resources on p2 will no longer be available but assuming those resources are replicated across other peers we can still locate them scalability because if the request goes to the server there's an extra load on that server and as the number of peers grows tens of thousands millions then all the requests are going to one central server and performance becomes a problem so that's another issue let's look at a different approach Nutella this is almost the other end of the spectrum almost completely opposite in the different approaches we have available we have a set of peers let's draw some to get started and just to make the example a bit better we'll have a few more peers what do we have p6 p7 that should be enough p9 we have some peers, they have resources let's distribute some resources it doesn't matter so much so we have a set of peers they have some resources available on them in Nutella we have a completely distributed system there's no central server it's an unstructured system in that we will not have a single index in fact the nodes will be quite simple in terms of what they store it's a flat architecture in that all the nodes are treated the same it's a hierarchical architecture in that we have a server which has a special role and then we have the normal peers so there are different types of nodes server and peers in Nutella everything's just a peer ignore loosely coupled we're not going to try and explain that before we go into those details let's use the board to illustrate what happens now we have nine peers spread across the internet where I've drawn them doesn't have to a geographical location just nine computers somewhere in the internet when each peer needs to know about some other peers so when the network starts there's one peer when a second peer wants to join the network it needs to know that other peer as more peers join the network they need to know about some of those existing peers so we can talk about the peers that we know about as a set of neighbors so to get started let's say that these nine peers know about some of the other peers in the network they have some neighbors and I'll draw them by just drawing some lines to connect them that is peer one has two neighbors two and three and I'll just connect some of them together each peer needs to know about some other peers in the network and we'll call those other peers neighbors these are not links not physical links but you can think of them as conceptual links that is P1 knows the IP address of P3 and P2 so it can send a message direct to P3 and direct to P2 and similar P3 knows the address of four, one and seven normally the set of neighbors that each peer knows is is quite small or maybe a few more consider a network with thousands even millions of peers still you only have a handful of neighbors three, four, five neighbors if a new node wants to join the network P10 arrives then it needs to contact one of the existing peers so it needs to learn the address of one of the existing peers say somehow it discovers the address of P1 and so how it discovers the address of P1 either it's advertised somewhere that is the computer has found it from some web page or there's some again some special server that stores the address of selected peers but a new peer needs to know at least one existing peer to join the network let's say it knows P1 then it sends a special message to P1 saying I want to join the network and those messages in Nutella are called P10 messages you send a ping message a new peer contacts an existing peer using a ping message this is the join procedure with Napster we said that we send a join message to the server here in Nutella they have some specific names how do we know that existing peer how to P10 know the address of P1 there may be some dedicated servers that list known peers contact one of them and then P1 responds with a PONG message and inside that message P1 tells P10 something about its neighbors so P1 tells P10 my neighbors are 3 and 2 as a result P10 now knows about P1, 3 and 2 in the network learn about other peers in the network and then that response may include other information not just IP addresses of the peers but information about files available then P10 of the peers that it learns about selects some of them to be its neighbors so P1 doesn't have to be a neighbor of P10 this initial ping PONG message is just to learn about some other peers P10 knows about 1, 2 and 3 let's say it selects 1 and 3 to be neighbors so they would send a message and join the network so as a result P10 will have 2 neighbors P1 and P3 in this case it selected 2 neighbors the number of neighbors it selects usually again a small number sometimes the response from P1 may be more than just the neighbors it has but also other nodes that it knows about so that was just the process of a new node joining the network it needs to establish connections with existing peers if something goes wrong and an existing peer leaves the network P3 leaves the network then there's procedures for P10 to find some more neighbors usually you need multiple neighbors for the system to work well now how do we search for resources let's say P10 wants to find a resource K try K3 first here K3 will work P10 wants to find resource identified by K3 that is resource 3 R3 they want to find that resource and access the resource so therefore they need to find the peer that stores that resource what do they do there's no index stored in any of the peers the approach that they use is essentially broadcast a message through the entire network they send a query first to all of its neighbors and those neighbors will send to their neighbors and they'll keep going to the peer that has that resource so P10 sends a query to its two neighbors saying I'm looking for K3 of course they don't have K3 in this case P3 has resource 1 and 4, P1 has resource 2 so those peers P1 and P3 send the queries to their neighbors P1 will send to and 3 do not send back to who's just sent you the message P3 sends to 1, 4 and 7 it may not send to 1 depending on whether it just received the query from P1 and what happens next do we find the resource P2 will receive the query it doesn't have R3 it will send to its neighbor at about the same time P7 receives the query it will send to its neighbor P4 receives the query P4 has resource R3 we were looking for resource R3 K3 so we found it and now we send back a response saying the resource R3 is available at P4 so send a response indicating where is the resource similar to this approach we can download or access that resource directly so the search approach in this case is to use a method of flooding we send a message through the entire network one peer to its neighbors those neighbors to their neighbors and so on until of course the resource is found then a response is sent back what does P10 receive how many responses does P10 receive it receives two because there are two peers that store that resource we receive a response from P4 and from P8 it only needs one I mean two is okay but in theory it's looking for a single response but of course P8 and P7 don't know that P4 responded so they still receive the query and send back a response P8 doesn't have to send the query on any further so P7 sends the query to P8 P8 sends a response, P8 doesn't have to send the query on any further so the query wouldn't go to P6 or P9 but it would go to P4 it would go to P5 and two responses will come back so that's the basics of Nutella it's a fully distributed system there's no server there's no index or in fact if we think of an index the only index is local to each peer P2 knows that R6 is at P2 P2 doesn't know anything about what other peers store so this is our unstructured approach the peers do not know about what's stored at other peers whereas in Napster the server knows what's stored at other nodes so that's a structured approach in Napster an unstructured approach here Napster is a hierarchical approach we have different types of nodes a server and the peers Nutella is a fully distributed approach a flat approach all peers are treated the same except we may have a special server that stores the address of selection of peers so that when a new node joins the network it can discover the address and contact an initial peer so that almost a two different endpoints of the set of design solutions for peer to peer systems a centralized approach except for the contacting of the between peers and a fully distributed approach where there's no central entity which one's better Napster, Nutella Napster, why? one time answer okay search is efficient here to search for a resource a peer sends a query to the server we get an immediate response because the server has an index of all resources so a response comes back and we know the other peer search is very efficient we send one search query get one response in Nutella to search we effectively send a query to the entire network we broadcast to everyone in the peer to peer network we send it to our neighbors they send to their neighbors and so on and we get multiple responses so we need to send many messages through the network to get a response when we search so Napster is better in terms of search we say more efficient, it's faster less overhead Nutella is not good for search because it takes time to get a response and we need to send many messages through the network to get a response what's good about Nutella then if p4 fails system will still work because okay, the resources on p4 but the resources on all of the others will be available and the search will still work because we'll get the query to p5 we'll get the query to these other nodes even if one of these nodes fails if our server fails in Napster the entire system fails so that's a significant difference there of course if a peer fails in Napster, not a problem but the server is the central point of failure I think most people have seen flooding before they understand the concept of flooding and that's what's used in the query here we send a message through the entire network we flood the network with the query and it's very inefficient but tolerates faults it's very simple as well there are some other details there are many details of how that works it takes to optimize to reduce the number of messages sent that is, instead of sending to all of your neighbors send to just a selection of your neighbors to reduce the number of messages sent or to put a time to live on the message so it doesn't go forever through the network but we will not touch upon them I think that's the main point that you need to capture what we've covered on the board so the points about firewalls and the list of peers the concepts we're demonstrating on the board so there's two options for peer to peer networks can we go better what can we do different again? mix them, easy this is Napster has the advantage of sufficient or efficient search it's very good for searching, very fast Nutella has this advantage of if one node fails the system keeps working so let's try and mix them together and see if we can get both of those advantages and that's what FastTrack does it's a combination of these two schemes it has and we'll show it on the board but it has, think of, there's multiple servers so it uses in some parts a central server but in some parts the other parts it uses the Napster flooding approach send a message to everyone so it's a hierarchical system in that there are different types of nodes and it has what's called a super peer architecture we have our normal peers and the special peers which are called super peers let's try and draw it on the board so I'm going to draw a set of nodes and the SP means a super peer but initially think in this case 12, 17 different computers in the internet just because I've drawn them next to each other doesn't necessarily mean they are physically next to each other this P4 may be here in Thailand SP2 may be in the US so although I draw them next to each other that's just in the conceptual relationships between them they may not be physically near each other so there are a set of peers we'll distinguish between the normal peers P and super peers just other computers in the network and each peer joins the network by contacting a super peer so for example P4 when it wants to join the network it needs to know the address of one of the super peers one of the super peers similar in Nutella when P10 joins the network it needs to know the address of another peer where does it get the address from in fast track or a super peer architecture P4 joins the network it needs to know the address of a super peer where does it get that from a server and what's the name of that server some special server here a super, super peer as with all of these approaches we need some special server that stores the addresses of some of the nodes in this case this super, super peer stores the addresses of some of the super peers in Nutella we needed a special server I didn't draw it but knew the addresses of some of the peers so that when P2 came along it contacted this special server said tell me the address of some peers and that special server sent back P1 and therefore P10 sent a P message to P1 in fast track it's similar when a new peer wants to join the network it contacts the super, super peer the special server which knows the address of some of the super peers and responds and saying okay here's the address of SP2 and then P4 contacts SP2 and joins the network via SP2 so there's this process of joining the network and it's the same process as in Napster when P1 joins the network it contacts the server and tells the server what resources it has and I haven't drawn the resources yet so that's awesome distribute some resources randomly amongst the peers resource doesn't matter so much just some resources stored on some of the peers and each peer knows about one super peer so when P4 joined the network it found the address of super peer 2 and joined via super peer 2 similar to Napster where each peer knows about the server except in Napster there's just one server and all the peers contact that one server in FastTrack or in general a super peer architecture there are multiple servers or super peers so we can say these lines indicate that the peers have joined via the corresponding super peer and when they join they then insert their resources into the network in the same way with Napster each peer tells the server which resources they store P1 told the server it had R2 so the server maintained an index at P1 or K2 is available at P1 K4 is available at P2 same approach here each peer tells the super peer about the resources it has so that super peer maintains an index so super peer 3 would have K6 is at P7 K2 is at P8 resource 6 and 2 at peers 7 and 8 and same with the other super peers let each have an index let's try and draw it so that's a similar approach to what happens in Napster but it's local between the peers and just their super peer so these indexes are stored on the super peers that's the process of inserting a resource we tell the super peer which resources we have now we want to search for a resource let's say P4 is looking for we'll do different searches K1 what does it do P4 is looking for K1 what do you think is going to happen where is P4 going to send the query to to its super peer SP2 so we think it's the same approaches in Napster you send it to your local server so send the query to SP2 SP2 looks in its index searching for K1 SP2 has it in its index K1 is available at P6 send back a reply the resource so the query is sent the reply is sent back the resource is available at P6 and then P4 contacts P6 directly and accesses that resource so that's the exact same approach as in Napster send a query to the server server looks in the index if it's found send back the answer and then the peer contacts the corresponding peer so that's easy try another one search so that worked what if we search for K what have we got 3 P4 is searching for K3 it sends the query to the super peer again the super peer doesn't have K3 in its index so he doesn't know where K3 is so what does super peer 2 it uses Natella all the same concept and it contacts it sends a query to other super peers so the super peers also have try a different colour know about other super peers the blue lines indicate relationships or connections between super peers in the same manner as Natella where one peer had connections with its neighbours in fast track super peer 2 has 2 neighbours SP1, SP3 how do they discover the same way as in Natella some ping pong, some exchange message to contact neighbours so when a query comes to SP2 and it doesn't have an answer in its local index it then sends the query in the Natella style of flooding it through the network of super peers it sends the query to its 2 neighbours what happens SP2 sends the query searching for K3 to SP1 and SP3 no this is the query the ping pong is just the joining part for Natella when you want to contact and find a neighbour once we've got neighbours which these blue lines indicate then simply those super peers look in their local index SP1 sees we're searching for K3 ok I have that in my index answer is P3 send back a reply P3 send the reply to SP2 SP2 can then send it to P4 and P4 knows K3 is available at P3 because the search query goes in the other path as well and in fact I think it will eventually it will go to SP3 SP3 will send to its neighbours SP5 will receive and send back the answer P11 because the same resources available also at P11 and send back a response so it's a combination of both approaches between peers and super peers we use the centralised Napster approach if we find the resource via the same set of peers via that super peer we get a quick response searching for K1, immediate response if we don't then the super peers communicate with each other using the Natella distributed approach by flooding a message through the network of super peers until well we may receive multiple responses back and eventually we discover those peers that store the resource it's hierarchical we have different types of nodes peers, super peers yes, it's small local instances of the Napster approach and then connected at the top level between the super peers using the Natella approach so the blue part is using Natella or a similar protocol the black parts are using the Napster approach and is trying to take advantage of the fast search in Napster send a query if that resource is available via the super peer we get a fast response if not then we need to flood and the advantage is we don't have to flood through all the peers in the network in Natella we flood through the entire network if we have 100,000 peers that's a lot of messages that need to be sent but with fast track or the super peer architecture let's say now we have 100,000 nodes maybe a thousand of super peers the rest are normal peers so flooding is only done between the super peers much smaller overhead there and what if SP4 fails if a super peer fails the network can still work because at least if SP4 fails then we lose access to P9, P10 and SP4 but we can still access the other super peers so even if a super peer fails we can still contact the rest of the network so we lose some of the network if a super peer fails but not the entire network as in Napster so it's a trade off between the two extreme approaches the three general approaches they were used in specific systems Napster, Natella and fast track but the concepts applied to different systems as well and the main different approaches there's a fourth approach to get a chance to cover it this semester maybe if we have time next week but I doubt it so they're the main approaches for how can we structure a peer to peer network and the concepts the challenge is locating resources resources are now distributed amongst the peers they're not on one central server so how do we know which peer has the file well we need some resource location system we either have an index on the server like the Napster approach we have a fully distributed approach where we flood the network with a query to find the resource or we somehow combine them in this fast track approach or a super peer architecture once once we find the peer that has the resource we then want to download that resource if it's a file we'd like to download it transfer in large networks many peers will have the same resource for example we search for K3 P3 and P11 have that same resource in a larger network there may be tens of hundreds of peers which have the same resource what we'll cover next week is a way to download that file from multiple peers at the same time so instead of just downloading from one peer try and take advantage of the fact that multiple peers have the same resource and download parts from one peer other parts of the file from another peer to try to spread the load across the network so we'll go into that approach which is using BitTorrent next week