 Hello, so I'm going to start with my talk about mirrors, about content delivery networks and about delivering free software. My name is Peter and if you haven't heard CDN before, CDN is short for content delivery networks, networks that are specialized for pushing stuff to users who want to get it. I don't want to bore you with technical details about all this today, I rather want to share my vision. I have a vision for the future, how to improve on these things and this I want to share with you and I hope you find it interesting and maybe we can stimulate some things that we can do together later. So what is it about? If you look at the world of free software projects then there are many, many projects that have stuff that they do for users and give to them and they are larger ones and smaller ones and all these share the same challenge. Some of these projects provide content that is really huge like CD images or larger software packages like OpenOffice, while other of these content providers provide smaller bits that are less problematic but may also be highly popular and mirrored around the world like the Linux kernel for example. Solutions to get this stuff to users exist in three forms, the content delivery network could be either a commercial one, the commercial content delivery networks are specialized in that like Akamai or YouTube, they would be very nice to use for open source software projects but they are really expensive. They do what they do well and you have to pay for it a lot. The second type of content delivery networks is things that the academic world came up with during the last 10, 20 years. There are some approaches that are highly interesting but they never reach the production state. Often they are too complex to really be implemented and often they have for example features like bandwidth and latency measurements between users and you might need nodes around the world for that to do these measurements and client feedback has to be provided to some servers and this is hard to implement because then it doesn't work with a normal web browser for example and you need infrastructure, real machines that are placed somewhere that do this work. So the academic approaches are sometimes nice to read on them but only few of them are really in use, only one actually. And the third approach which is very popular and also very traditional is to use mirrors. Mirrors are server machines contributed or provided by universities mostly, by ISPs or by private persons. So you download stuff from a mirror. And there are about let's say 300 mirrors around the world which provide this service and many of them you know well because you come back to them again and again. So how do we deal with these mirrors? How is this organized? Let's go to some examples. I have put some logos here of some approaches for this task. These are the commercial guys. Akamai is used by Apple, Microsoft, Novel to provide their software updates. Limelight is actually providing YouTube for example. Then you all know SourceForge. SourceForge has a few mirrors but very large ones. They have a web front end approach for users to get to this stuff. And another approach is the mirror manager Fedora which has again a slightly different approach. It is not working on file level, it's working on directory level. So it doesn't know exactly if a particular file is on a mirror but it has some kind of state database which knows roughly what mirrors do have. There is Bouncer, Bouncer is used by Mozilla and by OpenOffice. There are actually two versions of Bouncer and one of them is able to distribute client requests on geographic basis to mirrors in that region which Fedora also does. They didn't mention that. So that's probably something that you always want to do because connections to close the mirrors always work better or typically work better. There is another Bouncer version that does not support this which is used by OpenOffice. There is the Debian style approach which basically just does a schematic assignment of mirrors on a country and DNS round robin base. There is Mandrieva approach which Pair knows much better than me and we have a micro, probably better. Yeah, first of all I implemented using metalinks which we still do but we had it on the server side where it generated metalinks based on the coordinates in latitude and longitude and calculated the distance to the nearest mirrors which would be done on server side but it would require a lot more so then we switched to doing it based on the user's time zone and coordinates there on the user side so now it just generates metalinks local way and automatically picks mirrors which it features from the Mandrieva mirror list which is updated every now and then. That's about it. That's a very advanced and very nice system actually especially the new metaling generation. The geographic coordinates, do you get them from GIP or? Okay, so the client provides info to the server and the server decides where to send it. No longer. Okay. But it does provide the time zone to the server, yes that's what I mean. So the client provides some info to the server which allows it to select a mirror. So this also means or implies that this approach requires a specialized client so you couldn't use it with Wget or normal web browser. I mean web browsers send something like language header so if you go to the site then it can decide on that but sometimes that's wrong. Anyway let's not talk too much about metaling and focus for a moment on the comparison of these approaches again. Mirror brain is what Susie came up with two years ago and what is still developed since then. It is an approach that does not require a specialized client but it can work with specialized clients to do a more advanced mirror selection. And as I said I don't want to bore you with technical details. But two other approaches that are a bit similar is these are those academic guys. Coral CDN is actually highly interesting and it's working to some extent but it has the disadvantage that it requires you to use different URLs. You have to prefix some other host name to get something from the network. So again it's not transparent to playing HTTP and FTP protocol that many clients use. Codeine is the only candidate that might maybe reach more popularity in the future. It is in some production use. I actually know a few mirrors that take part in it and use it for some specialized things like a US American mirror delivering stuff to Singapore I think. He has been using that and finds it works well. So all these are a little bit different from each other and may require a client that is specialized or not. That's one of the differences and if you look at this picture this is what I see when I look at the mirror framework landscape. There are lots of different frameworks and they are separate and apart from those I just showed you the few there are many many others like Apache Software Foundation has quite a simple redirector that we can choose mirrors manually and many other software content providers do have some very small solution and so the need is there and everyone tries to solve it in a simple way and I will talk in a few minutes I will talk a bit more why it isn't so simple to do it in a simple way. It's quite a challenge actually to assign clients to good mirrors and also to provide the client way to fall back to another mirror and so on. These things are called cans. I don't know if you know the word if you go for hiking in English speaking countries then you often see them they actually useful they can show the way you can mark a path and see where you go but this is not so not leading anywhere so what we rather should have is something like this. This is the roof of the church and it's building that has been built by collaboration and cooperation so how do we get from here to there? We need or I think we really should introduce more collaboration on these things and what often happens what you often see happening is that communities are separate like these like the boys and the girls they don't talk to each other they are afraid of each other and you can also see this in open source communities like you care about your vicinity but you don't really know what the others do and there's a lot to learn from each other and they have something that they don't have and vice versa so I already mentioned that it's not so easy as it might look at first with dealing with mirrors and I will give you some reasons for I will describe some reasons why is this why is this not easy and I'm going to show you on a little example. The example is picking mushrooms and then deciding which ones you want to eat because if you ever did that then you know you look into books, internet, each mushroom might be growing together with others that are not the same type and you have to look at each one carefully and so on it's quite a complicated business and so let's try to explain how the selection of mirrors can be done on this background so first question would be does the mirror have the file in question so we need to scan the mirrors which can be done from a central location the mirrors don't have to have some software on it because they already provide the content like HTTP, FTP, Arsene so we can look at them. Another question would be is the mirror close or the mirrors in question are they close to the client so they could provide good service might the mirror be trustable no mirrors are never trustable because they could always be hacked or broken or it could be broken firewall in between and it could deliver garbage or actually manipulated stuff so it is very useful if you can sign your content and actually provide the signatures or verification hashes together with the content or for some files it just makes and may make sense to just do it yourself and just send the file yourself if it's not a large file then that's fine so for example all the signature files on your file tree you can just deliver yourself they may not be even larger at all than an HTTP redirect which is also 1500 bytes there may be private mirrors. Mirrors marked as private that are only meant for used by to be used by limited group of network clients mirrors can have very different performance they're big ones better ones and you have to prioritize on them and try to achieve a load balance between them for the larger files you may actually want to verify if the server is actually able to deliver that correctly that's about 20% of mirrors can't do that either on FTP or HTTP they are broken this regard many mirrors are useful but if they have to provide DVD images and extremely large content then they just go down to their knees so it may be useful to exclude them for bike delivery and then you have to monitor the mirrors if they are actually available because mirrors have to be rebooted they die for various reasons and you have to monitor them quite closely and no longer send clients there if there are problems. Clients may actually send along with their request some preference and you might have to you might want to respect that and because sometimes the client just knows better what's good for him and what works for him so it would be good to have provision for that and finally you cannot just go ahead and choose one mirror because this mirror might just not be available or it might not have the file and then you you more or less have a sorted list of better better not so good mirrors and you can give fallbacks to the clients and yeah so these are some these were some things that make mirror selection not so easy task and it's not something that you implement in in a day and actually all those most of those problem problems you don't even you're not even aware of them if you start I certainly wasn't when I started and it's you'd rather learn about this problem during your deployment and development and you start to collect experience and they are at the different use cases so this all I believe is solved in the mirror brain infrastructure and I also believe it is solved in a way that is that would be very useful for other content providers to use and I lost track okay after talking about the server side for so long might be useful to talk about the client side the other end for few slides and you all know classic HTTP and ftp clients and web browsers but you may also have heard of meta link clients meta link clients are specialized download clients that combine combine ftp HTTP and also bit torrent into a powerful download client that can work intelligently and fail over and if it encounters errors and problems connection problems or broken content it can verify this and it can actually continue downloading from elsewhere these clients also can download in parallel so they can try to max out in your internet connection and get the content faster and these clients are let's call them intelligent and meta links the meta link client need information to do this job and this job is provided to them by what's called meta links and so called meta link is is just a mirror list mirror list in xml formatted so it's machine readable and it is also it also can include it can include hashes and signatures for the files so the client has all that it needs to successfully download the file from somewhere so what really happens is a knowledge transfer from the server who knows the mirrors to the client who wants to use them and this works pretty well and I have a nice quote on that from Anthony the guy who invented the meta links which I was delighted to read from him and actually this combination is really a powerful combination because this is what really makes things work you can have the best server and database and mirror database and mirror scanning and everything and mirror selection in the world but as long as the client is just a stupid ftp client it will just or HTTP client it will just follow redirect that you suggest to him and then it will either work or will not work and whenever you want to implement some or want to have something like try again or try another mirror then you have to have some something on top of HTTP and FTP and this is what meta links do so back to the larger picture if you look at the world map then there are quite some countries and regions that are far apart and there are also different parts of the world with a lot of internet connection and less internet connection and looking at this map at this map I can give you some more reasons to believe me that it's not not easy to select mirrors because it's you want to you want to assign a mirror that is close to a client but often this doesn't work by just measuring the distance because the network topology looks extremely different from this and I will show you a few examples first example is New Zealand New Zealand is okay it's quite a simple case it has an edge localization quite there at the end and it's simple to see that they have proper connection over there connectivity but to the rest of the world it's much worse and they also have some connectivity to the west coast of the US I have heard but I have heard this from someone but I actually don't know this is one of the problems so why this New Zealand case is pretty obvious there are also still some things that you need to find out so anyway it's a good rule of thumb to just send clients from New Zealand to an Australian mirror but if there are no if there is no Australian mirror then you already have to decide which one is next and the chances are that these are not good because they don't have much interconnectivity often interconnectivity to the internet centers of the world is much better than from here to there because especially in Africa often people are connected with satellite links that go to where they want to go not to their neighbors another interesting case is Russia Russia is an extremely large country and I know from a lot of feedback I got that yeah I learned a few things I learned that China Russia doesn't work this continent called Asia would be the normal unit of geographical thinking when you use a certain library GUIP for looking up client if you locate a client and look up its wages and in this case Asia as a unit doesn't work well because there's not good connectivity between those large parts and other special things about Russia are that for example Ukraine can't get to these mirrors that are here maybe for political reasons I'm not sure Russian users have a very bad connectivity to other Asian mirrors which admittedly are quite far away Russian mirrors have good connectivity to German mirrors so you have to really have to some special cases and handle certain countries especially so never assigned to there from Ukraine but assigned to always assigned to Germany or something like that it makes it rather interesting and any any simple scheme will not work for all the countries and for very long as soon as you start to learn about all these particular cases then it becomes quite complex another interesting case is South Africa where basically South Africa at the tip of Africa has like I know five mirrors there and the rest of Africa has none so it's really quite concentrated here and the neighboring countries actually have decent connectivity to South Africa from what I've heard but Mozambique for example doesn't get good connectivity to African mirrors so Mozambique is better assigned to German mirrors again because satellite the national link goes there so this is another interesting exception or regional a regional particularity if you think about internet connectivity then I think for most people in this room it more or less looks like this like a high speed motorway but often we are not aware that for other people it may look completely different so there is a large part in the world where many people live more than in the well educated rich and well connected countries that don't have connectivity as good as this so a child in Africa may not have the opportunity to learn to be educated but educational information are key for being healthy and living a healthy life and finding a job and so on and people in less well connected areas also need to download stuff simple for simple reasons they need to download software like open office open office is quite plays quite a role in this because it's free so office production suite and it's really needed in people for people around the world and it's also why it's so popular and people have big problems to download open office because it's about a hundred megabyte download a bit more and that can can be very hard to download I have to quotes here from I quoted them for memory but this is what some people said who are affected by the situation so I believe that mirror selection that helps them would be very worthwhile if you look at the percentage of people in the world who have internet access that's 22 percent so there's four fifths of the world doesn't have internet access at all and of these people who have internet connectivity at all I have some numbers here about the percentage of those who have broadband connection like DSL something fast so in Germany it's about 24 percent Korea it's eight percent the Slovak Republic is six percent only which is amazing because it's in middle of Europe and in the poorer countries it ranges between one and three percent so there's a lot of there are a lot of users who have only bad connectivity in no no broadband so we can help them a lot so the question back to the question how do we get from the unorganized chaotic separate solutions to the big solution this is practically how I see the mirrors organized it's just that they are not well organized not as clear as here but it's a very loose organization the thing is that the mirrors those guys they are the same for fedora for ubuntu for open office for susan and so on you always meet the same guys again so any of these mirrors a lot of projects so what the user sees is the mirror or what you think about is this mirror that mirror but actually these this machine mirrors several projects this one actually the same and this is very similar it's just another file tree different layouts different sync times and different set of product projects that are mirrored but in principle you could see these elements and try to think about how to get this in structure so instead of setting up mirror brain for every project which would involve quite some overhead because each of them would have to know the mirrors and keep your database and so on you could also have one big database which knows about the mirrors knows about the the servers the contact persons about the content providers and keep this in line together and this is actually not just a dream it's actually not far from implemented i'm working in next generation database scheme that provisions for that so this this becomes possible it's not a big step it's just making things easier so you don't have to store mirror many times so this nearly exists and a common database could actually lead to another thing a common file tree because those mirrors could actually have the same file layout which would make it even easier to find around on them and to have a database that reflects their file tree but this at this point it already becomes might become difficult again because nobody knows if this is ever possible to implement because it would involve changes on every mirror and i can say from experience that most of these 200 mirrors that i work with for susan like 50 percent i have contact persons and the other three quarters or other two quarters no the other half maybe i don't have i don't even have a contact person i don't find out about one don't reach anyone even for a tripod phone and it's very hard to get hold of people that are so far away and they don't publish the email address or it's completely outdated and it's it's it would be an illusion to to say okay let's get all these together and change everything because it would never happen and these mirrors are also very different very different operating systems syncing of mirrors is also interesting topic topic for collaboration and for improvement because everybody has some scripts that are better or worse and it would be very interesting to help to have to have something working the which we can share so altogether this would this could form some free content delivery network that would actually deserve the name of a content delivery network and wouldn't just be what we have now which is some isolated solutions and very different mirrors and as you might might understand now the the business of selecting mirrors and it's not so easy and most other solutions won't get that right because it's a lot of work so here's my call for collaboration I want I want mirror owners and content providers and users and researchers to join a mailing list and to talk about these things because it's it's very important to talk about this and it's always enlightening if you talk to I talked I talked to fedora guy to pair from river and so on and it's always enlightening because everybody like these specialized guys also have a lot of lots of good thoughts about this and mirror admins from around the world have their picture about their region and what happens there and what's the political situation why is there no connectivity between north and south korea and so on and we need to get this knowledge together and the interesting thing is that a mailing list like this doesn't even exist there is no common forum for these things I think there have been news group like 20 years ago which whereas where these people were gathered but there's no place to meet them to meet them except you join the fedora mailing mirror mailing list or the susan mirror mailing list or the office mirror mailing list and then you always meet the same community but there's no shared forum where also the content providers are together but there's a lot of potential to get them together so this basically is what I wanted to share with you thank you for listening and I hope you have some input