 Last week we covered the peer-to-peer file sharing approaches. We covered the Napster style approach where we have a central server and it was similar like which I drew on the board here. We have some server and we looked at the ways for how to peers, how to different computers find resources and in the Napster approach the server stores an index of the resources. Think of the resources as files. And when a peer wants to find a resource it sends a query to the server. The server looks up the index and sends back the response where the response is some identity of the peer that stores that resource. That was the simple approach. Then we saw two different approaches, Nutella where we have no central server but peers flood a request or a query saying who has this resource and whoever has that resource response and then the fast track approach which joins the two together which has these super peers. So we have a combination of the centralized approach using a server and the distributed approach using Nutella the flooding. That was focusing on the search part which is one of the challenges for peer-to-peer systems is when you've got resources distributed amongst different peers how do you find those resources? What we want to focus on now is a way for accessing the resources. So with the Napster approach we said that the server sends back some address of the peer that has the resource. So if P3, let's make it a different one, has resource R4 and P1 does a query for R4 or in fact key 4 and the server knows okay P3, P3 has resource R4 so sends back a response and now it's up to P1 to access that resource to contact P3 and say I want that resource R4. So we didn't cover how to access the resource. We're going to go through BitTorrent and see the simple way or the basics of BitTorrent and see how peers can access resources. And of course in a peer-to-peer system if the system is large enough usually there'll be more than one peer that has the same resource. So now let's focus on files as our resources. Let's say both P3 and 4 have the same file. P1 wants that file, it sends a query to the server, the server sends back a response saying P3 and P4 have the file. So P1 can download that file from P3 or P4. Well we'll see what BitTorrent allows us to do is to not download from just one of those peers but to download parts of the file from P3 and other parts of the file from P4. So instead of just downloading the entire file from one computer download some parts from one computer on the internet and the other parts say the second half of the file from another computer and once you've received all parts put them back together and you have the file. And we'll go through a very simple example to show that in general we can have the advantage of possible faster downloads because we distribute the load amongst those peers. So we're focusing now on the download part or the access to the resources. How does that peer get access to those resources? Where the resources in our example are files. So let's first go through a simple example, it's not specific to BitTorrent but in general for two different ways in which we can download the file. And then we'll go through the concepts of BitTorrent and show how the protocol works. Let's keep it simple. Let's consider a case where we don't have a peer-to-peer system. You want to download a file and that file is stored on a server. So we have some server in our network and we have some peers or some computers, not a peer-to-peer system, we have some clients that want to download that file. And let's say we have five clients at the start. So this server stores the file and these are some clients that want to download. They all want to download the same file and let's say the server has a, to keep our calculations simpler, 1,000 megabit file, that's the size of the file, 1,000 megabits, I use megabits so we can calculate divided by the data rate easier. And the link that each device has into the network is one megabit per second, that are these links here. Same with the server, just to keep it simple in this case. So this cloud could be a simple switch or it could be the entire internet, but just for simplicity all devices have a 1 megabit per second full duplex link, which means at the same time they can download 1 megabit per second and upload 1 megabit per second. So like a 1 megabit per second link to your home, that's the capacity. And let's say the server has the file, five users come along and they want to download that file all at the same time. How long does it take to download the file? What does it depend upon? The server, the server bandwidth, okay, the capacity of the server in the link, assuming the server computer is fast enough. So to download, we'd contact the server and the server effectively sends or transfers the file to our computer. How fast can the server send the file to the computer? Well this computer can receive at 1 megabit per second. The server can send at 1 megabit per second, so 1000 megabits at 1 megabit per second would take 1000 seconds. That's if there's one client downloading, but there are five clients wanting to download from the server at the same time. So in that case, the server is sending the file to client one and at the same time sending a file to client two, client three, client four and client five. So if we focus on the link here for the server, the capacity can send at 1 megabit per second in total, but you think because it's sending to five different clients, it's effectively sending at one fifth of the speed to each client. It's sending the file at the same time to client one at a speed of 200 kilobits per second to client one, to client two, also 200 kilobits per second, assuming we're fair in that we use this capacity and equally, or it's fair amongst the different clients and in your assignment, you saw that that's usually the case under ideal conditions. So the speed at which the server is sending to each of the clients is 200 kilobits per second, one fifth of the capacity of the link. Because we have five clients, we need to share that capacity amongst five. How long does it take for one client therefore to receive the file? Your computer one, how long does it take until you've downloaded the entire file? 5,000 seconds. If it was only this computer downloading, we've got 1,000 megabits at 1 megabit per second, 1,000 divided by 1, 1,000 seconds, but now we've got five computers downloading, so we've got one fifth of the speed, so that takes five times as long. After 5,000 seconds, computer one has downloaded the file. And also, so have two, three, four and five, because they all have the same download speed. So one server transferring the same file to five different clients using unicast, there's no multicast here, to five different clients at 200 kilobits per second each, 5,000 seconds to download the file. Any questions, the mathematics for this, make sure you, it's a very simple concept, but we'll use that in the next parts, the more important parts. Now five users downloaded that file, it's a popular file, now 100 more users come along into the network and they also want to download that same file. I'm not going to draw them, but imagine now there's 100 more computers here, different ones, same speeds, want to download the same file from that same server, how long does it take them? One of them, 100,000 seconds, because we've got, our limit is this link to the server. The capacity is 1 megabit per second, if we've got 100 computers downloading, we must share that 1 megabit per second amongst 100 users. Same file size, if we were sending it 1 megabit per second it would take 1000 seconds, if we're sending it 100th of the speed it would take 100,000 seconds. So here's that 100th computer, it takes 100,000 seconds for that computer to get the file. The more computers downloading from that server, the slower it's going to be. This is if we just have a server that everyone is trying to access, because the download speed depends upon the capacity that that server offers. In this simple example, that's what I've tried to capture on this slide where we've got two phases. First there's five users come along and download the file, then sometime later another 100 users want to download the same file, all accessing from the server. The first five users get it in 5,000 seconds, the next 100 users in 100,000 seconds. Let's try something different to speed it up. Let's say the first phase is the same, five users come along and they download the file from the original server, it still takes them 5,000 seconds, nothing changes. Now, our second phase of the new 100 users come along and instead of downloading from that server, download from the five users who have already downloaded the file. Because after the first phase, the server has the file, so does computer 1, 2, 3, 4 and 5. When 100 users come along, instead of all contacting the server, 20 of the users will contact computer 1 to download the file, another 20 computer 2 and so on and another 20 computer 5. That is, those 100 users download the file from those five different clients that have already downloaded the file. So, where can I write it? There are 20 users downloading from client 5, the same logic or the same math supplies, we have a limit of 1 megabit per second, therefore 20 users are downloading from there at the same time, so each of those users would take 20,000 seconds to download. Because we are only sharing that capacity amongst 20 users. And similar, another 20 users are downloading from computer 4, the capacity of 1 megabit per second is shared amongst 20 users and therefore those 20 users, it takes them 20,000 seconds to download the file and let's try to capture it on the next slide. So, a different scenario where the first five users do the same, they download from the central server, but then the next phase where 100 new users come along, they don't download from the server, they download from the other users who have already downloaded the file and it would only take them 20,000 seconds to download the file. Effectively we are sharing the load, instead of everything going by the server, now we are sharing it amongst the other computers, so instead of being limited by this 1 megabit per second link, we actually have five 1 megabit per second links that those 100 users can download via. And hence, we go from 100,000 seconds to 20,000 seconds to download, a five times improvement, five times shorter, this is the concept that many peer to peer file sharing applications or protocols use in that there are multiple peers that have the file, rather than downloading from one central server, the clients download from multiple peers, with the idea of distributing the load amongst those peers rather than concentrating the load on one server. That's a simplistic example, of course such a scenario will not happen in practice, it depends on many different factors, so we saw in the first case, if everything comes from the server, it takes 5,000 seconds in the first phase and then 100,000 seconds in the second phase, but if the first five download from the server it will take them 5,000 seconds, but then they download from the ones that already have it, it only takes 20,000 seconds, that is we speed up the download by distributing the load amongst the different clients, so we get a significant gain in performance, because it takes much less time for all 105 users to download the file, how much gain do we get from such a scenario, how much of an improvement in performance, it depends on how many initial servers, in my example we started with just one, but of course we may have multiple initial servers, these five users who initially downloaded the file and then others downloaded from them, we can call seed users, they start the download for those 100 other clients, so the performance improvement depends upon how many seed users there are or how many seeds, and in my example I had 100 users downloading in the second phase, how many there are will also impact on performance, so the number of downloading users, of course in my simple example everyone had the same upload and download rate, 1 megabit per second, in real life different users in the internet will have different upload and download rates and that will impact upon the performance of how fast in total does every user get the file, in this example if we download from the other clients it took the 5000 seconds plus another 20,000 seconds, so we can use a peer-to-peer network to download files faster potentially by downloading not from a central server but from other users who have already downloaded the file, and in fact it doesn't have to be limited to the file in which, how do we state this, user one downloaded the file from the server initially and then we said other users could download from user one, it doesn't have to be limited based on the file, user one can download parts of the file from the server and once it has those parts other users can start downloading from user one, so in fact the way that we structure the file can also assist, if we divide the file into pieces, our 1000 megabit file let's say each piece is 10 megabits, as soon as user one has the first piece other users can start downloading from user one and therefore that would potentially reduce the time for those other users to access the files, so we'll see that peer-to-peer file sharing protocols use these concepts of download from other users at the same time, not just files but pieces of files, and that's some of the basic concepts of peer-to-peer file sharing, what are the advantages of doing this, we're not always dependent on the original server, once some of the users have downloaded the file from the original server, those 100 other users don't need to know about the original server, if the original server fails they can still download the file from these five other users, so there's no or less dependence on any one server in the network, there's redundancy in the network, in that the file is available from multiple different locations, the more people that download that file the more locations a new user can download it from, that's an advantage, and we distribute the load instead of the server being the bottleneck, everything going via the server, where we're limited by the server link and the server speed, we distribute that load amongst different users, what's the problem of this, it relies on the participation of other users, this only works if these first five users once they download the file make it available for others to download from them, if the 100 users cannot download from users 1, 2, 3, 4 and 5 we get no gain, so this relies on those that have downloaded the file to make it available to others, and these five users in this simple example were called the seed users, so as the number of seeding users decreases, instead of five if we have just three the performance decreases, and of course in practice this set of five users in a large network may vary over time, at one time instant there are 100 seed users and then five minutes later, because some of them have disconnected from the network there are only 80 seed users, so the performance will vary over time, the other problem with peer-to-peer file sharing is there's no control of accessing the content, if everything has to go via a single server the server can control who accesses that content, for example they can put a username and password on so you need some credentials to download that file, so we can control it, but in this peer-to-peer approach once these first five users have downloaded the file the server has no control over who else can access that file, so that's a problem in some scenarios, it's difficult to control who accesses the content and also to find the content, and that comes back to the techniques we looked at last week of how to index the content, if computers one two three four five have the file how do other users in the network know that those five have the file, well we saw some approaches last week of how to search how to index the content or index resources, in peer-to-peer file sharing most of the usage or the use of peer-to-peer file sharing is mainly for distributing in many places illegal content or copyrighted content, so there are a number of issues where whether it's good or bad, there are also issues with the security that is can we trust the users that we're downloading from, what if we want to make people pay to download that file, if we have a central server that control who accesses it and have some accounting to keep track of who needs to pay, in a peer-to-peer system that's much harder how do we do that, so there are many other issues for how to make peer-to-peer file sharing work, we're not going to touch upon those issues, we're going to look at about how the protocol works, how the data transfer works and look at the advantages in terms of network performance and we look at one example and one of the most popular peer-to-peer file sharing protocols called BitTorrent, so there are different protocols that use these concepts, BitTorrent is one of them Nutella is another protocol for transferring files and there are different applications, pieces of software that implement these protocols and provide some user interface and it gets confusing because BitTorrent from what we're discussing, we think of it as a protocol, it's a means for exchanging information, but in fact there's also a software, piece of software also called BitTorrent, but there is other applications that use the same protocol called BitTorrent and a piece of software that uses the BitTorrent protocol as long as they implement the same protocol then they can communicate with each other, but there are other protocols Nutella Napster and so on and different pieces of software, we're going to focus on the protocol called BitTorrent and let's see how it works, first we'll go through the terminology and then see the basic concepts, so in BitTorrent a file, in fact not just a single file, but a set of files is referred to as a torrent, so in my example I said we had a 1000 megabit file, we could say that is a torrent, but a torrent can actually contain multiple files, we'll see the structure in the moment of a torrent, but we can think of it, it describes a set of files, not just one file, often a file with the extension torrent is a file that describes that torrent, we'll see an example of how the format of describing it, by describing it it would include a list of files that are included, so we think of the concept of a torrent refers to a set of files, a dot torrent file describes that torrent, the users that are accessing a torrent are called peers, so all the users that are going to access this torrent download this file, peers, these peers may be downloading or uploading the files, or both, a peer that has downloaded the entire torrent that has the entire file or set of files is called a seed, so that's like our initial five users in our simple example, once I've downloaded the entire file they are making it available to others to download from, these users, these peers are called seed users, or seed peers or just seeds, and we say that those users are seeding the torrent, the other peers which haven't yet downloaded the entire file are referred to as downloading peers, downloaders sometimes are leeches, these are peers that want to get the entire file but only have a portion of it at some point in time, once they get the entire file, the entire torrent, they become seed peers, so peers in the network are either downloaders or seeds, once they've downloaded the entire file they become seeds, a swarm is the set of users connected to a torrent, so it's all the peers including the seeds, so if we have some network, we have some network with some computers attached to that network, so any set of computers in the internet, they have some IP address, we say that a set of those computers for a particular torrent, example dot torrent, we may have some downloaders and some seeds, and those set of users which are a combination of the downloaders and seeds are called a swarm, so all of these computers, the three seeds, the four downloaders are accessing a particular torrent, let's see if we've got a, not yet, before we go to a specific example, a few more concepts, instead of having to download an entire file before you make it available to others to download, the file is broken into pieces, so in fact I say a file is divided into pieces to be accurate, a torrent is divided into pieces, a torrent can be a single file or many files, so a torrent is divided into pieces, once a piece is downloaded by a peer, then it's made available for others to download from it, that is that peer can upload it to others, so when this peer has downloaded a particular piece, other peers in the swarm may download that piece from this one, pieces are further divided into blocks, so we have a torrent is divided, a torrent is made up of files, the torrent is divided into pieces and pieces are divided into blocks, they usually have some defined size, we'll give some examples later, let's see a picture of that shown here, so let's say we have three files that we want to share amongst a set of users and they're all related, file one, two and three, different sizes, a torrent, this example torrent contains these three files, so think of the three files are joined together, the dot torrent file and we'll see an example or some contents describes those three files, the file name for example, the size and other features, the torrent is divided into pieces, usually or they're equal size pieces except for the last piece, of course may be smaller than the others, example we have let's say three files, file one dot txt, one megabyte, file two dot mp3, nine megabytes and file three dot avi, nine hundred and ninety megabytes, three files let's say they're all related, so the torrent in this case, the torrent which contains these three files is 1000 megabytes and our peers want to access this torrent, they want to download it, which is equivalent to downloading the three files, a typical size of a piece, there are some varying numbers, but let's say is 500, or half a megabyte, 512 kilobytes is a typical size, one megabyte, 512 kilobytes, let's say each piece is half a megabyte, then we have in this torrent 2000 pieces, it's divided into 2000 pieces, once a peer downloads one of those pieces it can make it available to other peers to download from it and each piece is divided into blocks and a typical size of a block maybe 16 kilobytes, so each piece is divided into blocks and we see when we look at the download protocol you actually download blocks at a time, so equal size blocks and the piece, the last piece maybe smaller than the other pieces, we'll see that we what we normally do is the set of pieces that we download does not have to be in order, so and normally recommended not to be in order, so for example we may download random pieces, but we download the piece, we want to get an entire piece before we move on to another piece, so we'll see that the way to download a particular piece is there's a simple protocol for instead of saying download the entire piece, download the first block, make sure it's correct, do a check on it and then download the next block, so download parts of it at time, in case something goes wrong, in case you have to do a check some on that the protocol for exchanging and also informing about what's available we'll see is done on a block level, but the piece level is used to indicate is used if we number the pieces, how many pieces does the seed have in our example, the seed here, how many pieces in our example, how many pieces does the seed have, seed 2,000 pieces, remember by definition a seed has the entire torrent and our torrent of a thousand megabytes has 2,000 pieces, so the seed if we listed them has piece 1, piece 2, piece 1, 2,000, it has all of the pieces whereas the downloaders only have a subset of the pieces, they don't have all of them, by definition they don't have all 2,000 pieces, they may have many pieces but they have less than 2,000, we'll see we'll come back to the way that we request individual pieces in a moment, for now let's just go through the terminology of torrent is made up of one or more files, a torrent is divided into pieces, pieces are divided into blocks, the set of seeds and downloaders that are accessing a particular torrent is called the swarm or a swarm, there's a special server that manages the list of peers in a particular swarm, so there's some server called a tracker that has a list and says that if we give these numbers, these are our peers, then it would say that the tracker would have a list that says for the example dot torrent, if that's the name of our torrent, then the peers are 3, 5, 7, 8, 9 and 10 and of course it keeps the actual addresses of those computers in the internet, IP addresses port numbers normally, so the tracker keeps a list of all peers that are accessing a particular torrent, IP addresses port numbers and other information, we'll see some examples, some statistics about how much they've downloaded and uploaded as well, so there's a tracker needed for a particular torrent, there may be multiple trackers in the internet in our network, but for a particular swarm we say there's, in fact there may be multiple trackers for a particular swarm as well, but for simplicity there's one tracker for this swarm that keeps track of who are the peers accessing that torrent, these are computers on the internet in practice, they may appear and disappear at different times, here's a computer it may leave the network, therefore the tracker must get regular updates as to who is in the swarm, because if this one disappears it needs to update the list that it's no longer in this swarm, another device or another entity is an indexer, we need to keep track of or have a list of the available torrents in a particular network, so here we're focusing on just one torrent, I've called it example.torrent, this example.torrent refers to three files, file one, two and three, there may be many torrents in a particular network, an indexer is a server that manages a list of torrents, normally a website that allows a user to search for dot torrent files by name and other information, and from the indexer once you find the dot torrent file we'll see the dot torrent file describes the tracker, so let's see how that works, so now with those different entities indexer, tracker, seeds and downloaders, four different types of nodes and the the types of torrents pieces and blocks see the basic steps for using bit torrent, so we have a peer, someone's computer, it's running a piece of software that implements the bit torrent protocol, we can say this peer is identified by an IP address in the internet and a port number, so we can think instead of some number 10 we actually have an IP address and a port number, initially this peer that wants to download a particular torrent doesn't know any other peers, the peer contacts an indexer and from the indexer obtains a dot torrent file, so to start downloading a file or a torrent you need the dot torrent file, where do you get it from, you get it from an indexer, so the peer contacts our indexer, the indexer maintains a list of dot torrent files and some description of each of them, for example the website maintains a list of dot torrent files, the peer does some search for what they're looking for and they get as a response a dot torrent file, how do they contact the indexer and download the dot torrent file, that's not part of bit torrent, so there's a bit torrent protocol, how to do that part is not specified that's separate, the normal approach is using a web browser there are web servers that act as indexers, they have a list of dot torrent files you can search and you download those dot torrent files usually using HTTP, once you have the dot torrent file the peer software uses that dot torrent file and they use it in the next step, the dot torrent file describes not only the files in the torrent but also the tracker or trackers, the dot torrent file says what is the IP address of the tracker so once our peer, let's say our new peer here has the dot torrent file, inside that file gives the identity of the tracker IP address port number, so then the peer contacts the tracker and asks the tracker what are the addresses of the other peers in this swarm, so our new peer contacts the tracker and says who is currently in this swarm and the tracker will send a response saying the current peers in this swarm are 3, 5, 7, 8, 9 and 10 and once that's done our new peer knows the addresses of the other peers in the swarm and once they know the addresses they will contact them to try and download parts of the torrent, so the tracker as we said keeps a list of the peers in the swarm a new peer contacts the tracker to get that list of peers how does it contact the tracker there's a a protocol for doing that which we'll see an example of once that peer knows about other peers in the swarm so our new peer here has just contacted the tracker it's found a list of other peers in the swarm it then contacts some of those other peers and starts downloading pieces of the torrent downloading some pieces from one peer other pieces from other peers until eventually it's downloaded all the pieces in the torrent and has the entire torrent and then it becomes a seed in that swarm and of course once this peer has started to download some pieces other peers who also want that torrent can download from it during this download phase the peer also maintains contact with the tracker giving some update that it's still in the swarm and some statistics about how much it's downloaded and how much it's uploaded so that as a result the tracker knows about all peers in the swarm and the statistics of which ones have downloaded and uploaded certain amounts so when a new peer comes along again they contact the tracker and then start contacting peers so three main steps a new peer obtains a dot torrent file from an indexer normally using a web browser and a web server using HTTP once you have a dot torrent file that describes the tracker the peer uses a protocol in BitTorrent for obtaining the list of peers from the tracker and then uses a different protocol for downloading pieces from peers in the swarm we need to go through those two two protocols the one for contacting the tracker and then how to download the pieces first let's look at the structure of the dot torrent file so let's say our new peer has contacted an indexer and downloaded example dot torrent the dot torrent file it's a smallish file that describes the torrent what does it contain so it's a text file of a particular format and the main pieces of information it includes some information about the files so our example dot torrent would include information about file one dot txt file two dot mp3 file three dot avi the set of files included in that time in particular file names the length of the file how big the pieces are and some unique identifier of each piece and the way that the unique identifier is obtained is using a hash everyone knows remembers hash functions who doesn't know hash functions you do hash functions anyone a hash function without going into the details given some file if we apply a hash function on that file we'll get a unique identifier as an output and the char char one hash function returns a 160-bit identifier if we have a different file and apply the same char hash function we get a different identifier as output usually they identify as random it's just a way to uniquely or uniquely identify a file when we have many different files we can't just use file names to identify them we use hash functions so what's inside the dot torrent information about the files in the torrent that's one set of information there's something called the announce which is really an identifier for the tracker it's a url for where the peer can contact the tracker and in fact there can be more than one tracker so there can be a list of trackers some creation date when the torrent was created and by who or what software and maybe some description of the torrent some comments so here's a selection of an example dot torrent file and this was a couple years old this was for ubuntu cd so to download the ubuntu operating system this was part of the dot torrent file it's not it doesn't contain the files it describes the files in the time we don't need to worry about the format there's some definition that says that okay we have some field a field field name and a field value and i've tried to highlight them in in different colors to identify them but of course they're just text in a file so this is saying this is the announce field and the value is a url which is the address of the tracker okay so the announce field identifies who the tracker is in the internet and we will use that the peer will use that announce field to contact the tracker some comment describes the the torrent some date there's some format for the dates to be created and some information in this case it contains the length of the torrent actually the length of the file in this case there was just one file inside the torrent potentially there can be multiple files it's 732 megabytes okay about 732 megabytes the name of the file is an iso so a disk image the length of each piece 524 kilobytes so about half a megabyte so in this example our file is 732 megabytes our piece is half a megabyte then then there then for each piece there is information about that piece in particular hash value and this doesn't make sense because what a hash value is when we'll see i'll show you an example is a random sequence of bits so in ascii it doesn't come out as any meaningful characters what's the mistake in this slide what's the mistake is a mistake in one of the numbers how many pieces should there be look at this the size of the file you've got a calculator how many pieces should there be how big is the file file is 732 megabytes approximately 732 megabytes each piece is half a megabyte okay so about 1400 pieces in the file not 27960 pieces this should be it's about 1000 and i think it's 1397 pieces in this torrent there are about 1400 pieces simply the file length divided by the piece length this number 27960 is the length of these hash values okay it's not the number of pieces that's this is wrong this number that is for each piece of these 1400 pieces so if we divide the file into different pieces and then we take a hash of each piece we get some unique number and that's what's listed here there'll be 1400 numbers identifying each of those pieces so a piece a piece has a unique id a unique number as a quick example of hash hash functions char1 is a common hash function md5 is another hash function you may have seen in in different systems uh we have some files so first example i have some audio file from this semester and an mp4 file we can calculate the hash i have some software called char1 some of the particular file this is the hash value it's just it calculates a almost unique identifier and it's 40 characters long or 40 hexadecimal digits or 160 bits long in this case that's the typical the common hash function used in in bit time and if we have a different file we can calculate another identifier and that should be unique in the set of files available of all the files in the world if you get a hash function of that file if the two files are different you'll get two different hash values two different identifiers just to illustrate that two text files here one says hello one one says hello two if we calculate the hashes of those files they differed just by that last character the hash values are completely different as if that their random identifies here so the idea is that any piece in our torrent has a unique id that's useful for checking the integrity of the file that's downloaded you can check that because in the torrent it describes all the hash values it lists them all once you download a piece you can calculate that hash value and check that you have the right piece so that someone cannot put a fake piece into the the file and you detect that if it happens so there's some security mechanism in place so just think of them as unique identifiers for pieces and they are listed those in our case 1400 values like this are listed in the dot torrent file they would be listed here so that's an example of a dot torrent file that's obtained from an indexer once our peer has this it looks in the announce field and then it contacts the tracker and the announce field is a url it tells it to contact your tracker in this case at torrent.ubuntu.com and a port number and the method for contacting the tracker is to use http so how does a peer contact the tracker and communicate with a tracker the tracker in fact implements a web server a http web server and the peer communicates with the tracker using http it's not for web browsing it's just using the same protocol for communicating some information between the the peer and the tracker and how it normally works is that the peer sends some http get request to the tracker where the url contains information about what the peer is doing some identity of the peer port number the peer is using the number of bytes that it has has uploaded and downloaded so far and the bytes remaining to download and whether it's currently downloading whether it stopped downloading or it's completed that is it's become a seed the peer sends a request containing this information to the tracker the tracker ascends back a response including the interval between which the client or the peer should contact the tracker the peer doesn't just contact the tracker once it does it on a regular basis say every 30 minutes and how often well the tracker tells the peer how often to contact it the tracker also tells the peer the number of peers or number of seeds and downloaders or leeches in the swarm and it gives a list of those peers including their ID address and port number so the peer sends a request to the tracker the tracker sends back a response containing a list of all peers the interval in which it should contact and some warning messages or error messages if needed then there's some optional information that may be included but that's the basic idea and here's a again another example of that trying to show the the request the blue one going from peer to tracker and the response coming back from the tracker to the peer and i've tried to structure the request in response to identify some of the important parts so it's using HTTP so the request is just a get request where we have get followed by some url but as you know with urls you can give parameters to those urls the format is we have some equals we have some field equals some value separated by ampersand sign for multiple fields i'm sure you've seen that in different urls so this is the information being sent from peer to tracker some information or hash value it's sending the identity of the peer the port number that the peer is using one one five four one the number of bytes it's uploaded and this is just at the start so it hasn't uploaded any bytes yet zero number of bytes downloaded zero the bytes left all bytes the 732 million bytes in this case so this is just at the start of course as it starts to download bytes and upload bytes those values will change what else is some different fields here what's the status whether we've started stopped or completed the download in this case we've just started and some other information about the the peer the IP address of the peer the number of i think this is the number of peers that it wants back in the response it wants to receive a list of 200 peers in response or up to 200 so the peer sends this to the tracker the tracker who keeps a list of all other peers sends back a response a HTTP response and inside that response is the the what have we got the number of peers that have completed the download which is the number of seeds in the swarm the number of peers that have not completed the download which are the number of downloaders or leeches so the tracker tells the peer how many downloaders and seed seeds we have in the swarm at this point in time and then an interval in which the peer should contact the tracker this is 1800 seconds so every 30 minutes in this case so 30 minutes later the peer will send this another request updating the values for the upload download and left to inform the tracker of its current status and we'll get a response and then we have a list of peers so it's not shown here but it will contain information about the the different peers in the swarm at this point in time including IP addresses and port numbers as a result our new peer knows the addresses of other peers in the swarm so once we receive the response our peer knows about some other peers and can contact those other peers so there are three steps remember a peer contacts the indexer a peer then contacts the tracker and then the peer contacts other peers to download the torrent the first step of contacting the indexer is done using some other method normally just accessing a website and once you contact an indexer you download a dot torrent file inside that dot torrent file includes information about the swarm and in particular the address of the tracker so in the second phase the peer contacts the tracker using this hate these types of messages using htdp of sending a request to the tracker the tracker sends back a response containing a list of peers then the third phase the peer starts downloading pieces from other peers any questions on how bit torrent works exam in one week question on bit torrent in the exam any questions so far easy what's hard the exam's hard yes but what's hard about bit torrent you don't have course have to remember the structure of these messages but try and understand the concepts of what are the roles of the different entities that we have indexers trackers peers two different types of peers seeds and downloaders or leeches what defines them and how they communicate with each other the last part and maybe the most important part is once a peer knows about other peers in the swarm it needs to start downloading some pieces how does it do that out of the peers exchange pieces because we'll see as one peer wants to download pieces from one another peer then they may also want to eventually upload pieces to another peer as well exchange pieces in both directions so there's a peer exchange protocol we have 10 minutes left i don't think we'll finish today do you want to try and finish today or continue tomorrow let's finish tomorrow so tomorrow what we need to do is exchange explain how the peer exchange protocol works let's let's start on that tomorrow and we'll have a bit more time to go through that and you'll be able to answer some ask some questions about the the earlier parts