 Hello and thanks for coming in, thanks for your interest. I'm going to talk about the download redirecter that obviously the employees and even though this is a generic project or generic useful piece of code which could be employed by other projects too I'm talking about it with some open SUSE context because that's where we use it and I hope it's nevertheless understandable for you if I talk too much about open SUSE specific terms then don't hesitate to interact with me. To introduce myself, I have been working with SUSE for 8 years working on the download infrastructure and also on build service and in the past I have been doing a lot of apartinations so I always fit into web stuff So what did we serve? We had lots of products like the release distributions then one and two and three then there are unstable snapshots, several architectures then there are sources, bug enforce, test trees, other test trees and then the build service, so it's recently which adds a lot to that So that's all, it doesn't sound too bad, does it? But this is really much Altogether we find 700 powerful fighters right now in the open SUSE tree and just to start all those fires takes a long time as you can guess and the size adds up to a total of nearly a terabyte meanwhile and we also have lots of download requests to that stuff So, well, it doesn't sound too bad, does it? The problem is there's no single big fat pipe which can handle that nobody has such an internet connectivity and there also isn't a mirror which can handle that much stuff I have called here the mirror which basically says oh, that's an added heavy weight, Debian and Ubuntu is just 200 gigabytes So naturally the commonly used solution to that problem is employing a content delivery network There are lots of them and they can handle that but they are dead expensive It's not really a question for us because we can maybe default that for one with today's opinion So it's not a solution for us So we need some kind of poor man's CDN This is basically what we're trying to build Luckily mirrors come to help Thinking about mirrors, it's interesting to know that they don't cooperate with us in any way that we can control them because they just mirror us for their own benefit Some making a business with it Some also want to help us and they simply mirror us if we want it or not We have not much influence What we can do is we can facilitate mirroring for them which is our interest In the past the mirrors used to be equal and they used to be equally useful because they were either mirroring us or not so if the mirror was mirroring us, it could fast enough for you, close enough to you then you can simply use it and stick to it and just do everything for the mirror But today they are really different because they don't mirror all the complete trees and for users it has become impractical to just stick one mirror on and stick with it So with all these trees that we offer, the size there's virtually no mirror that can mirror that completely so there are no complete mirrors Most mirrors are only some parts or mirrors can also be outdated they can, maybe they don't have caught up and they're already used to release products So what we see, there's partial mirrors and really only a few sites will actually mirror a terrarium It's actually even hard to find anyone just from mirroring the windows which are 200 megabits not even the largest source watch mirror has capacity for that at the moment I mean they have lots of capacity but they don't have no frequency right now to add this to their mix So we see partial mirrors and if that wasn't enough our content also has an extremely high turnover rate because it's not a tree that's released three times a year because we have a build service and skew updates and they are released basically all the time It could be every few minutes that new packages turn up So that's actually faster than it can be synchronized which in fact because during this time it's already so mirrors are some kind of unsharp picture so we have no up-to-date mirrors to be added to by this machine So what we add are partial mirrors so what can we do with that? Ah, another before the answer another thing which makes it even more complicated is that with open source binary package that we offer for download those packages are referenced in beta data and this beta data contains check sums and cryptographical signatures If there's something wrong or packages come from wrong version from the mirror then the client will trip over it So back to the question, what can we do? One thing would be maintaining mirror lists and trying to keep them up-to-date but that's extremely hard to maintain they are too static and they are never ever correct so mirror lists won't work to some extent for human users because if a human user downloads some user images then that's maybe two or five files then it's possible for the user or feasible for him to try two or three mirrors and then find out which one works for him but it's still annoying the mirrors out-of-date which they are and this trial and error process leads to or can lead to some effect which can be called source-fog effect because that's what happened to source-fog three years ago People will, in the process of finding the suitable mirror they will all end up with one mirror trying to use the same mirror because that's the one that has given them the least headache so at source-fog every user after some years went to the Irish mirror so probably for that but it will make mirrors overused so they're not really good anymore so there is one solution which is highly dynamically generated mirror lists so they virtually need to create it in real-time and even though we can't execute any controlled mirrors we can watch them carefully and see what's there so that's what we are doing so we redirect lines to mirrors that's what the redirecter is about and in order to be able to do that we need to know what the mirrors have so what we need is an inventory of files a kind of file list of each mirror which we need to keep up to date which we do by scanning them and then we need to periodically probe the mirrors by online at all in the live and with that in place we can redirect the clients HTTP requests the clients to mirrors and this is something that works with areas but not HTTP clients so it doesn't involve any specialised clients and while we're doing that we have the opportunity to locate the client see where it is located and which part of the world by where this IP address and we can load balance the requests based on mirrors capabilities so we can have the read mirrors and send them less requests if we know those mirrors we also have the opportunity to do sensible to allow to enable sensible caching of content because caching is always important because it frees the lines and saves bandwidth but we need to exactly set cash control years for that but we can do that because it works with our we do it on our own to create requests going through our infrastructure we can also decide to deliver files directly instead of redirecting so that is another demo control and that's why we actually use only HTTP not FTP because FTP is built too stable and can't have no way of cash control with cash arbitrary like 6 to 24 hours so the inventory that we need is obtained by scanning mirrors which will be by arzing mainly or for that queue so the arzing is not available the inventory is kept in database and there is an interesting characteristic of this inventory when we send our stuff, new stuff to servers then old stuff becomes often populated and files are deleted on the mirror in the data mentioned in the inventory the stuff will only disappear with the next scan from the next scan of the mirror but there is no problem for us because the files disappear first from the master itself so once they are gone there then we don't need to redirect for them or deliver them so there is actually no need to look up the inventory it will be cleaned up later but it doesn't matter the whole idea was to my knowledge first implemented and fall out by Christoph Thiel which we presented to you two years ago it didn't scale very long because first it was only used for some parts of the distribution but when we tried to use it for more parts it wasn't scaled enough it was redesigned a year ago and rewritten as a pattern module in C which is called Mott-Zilka Glow and there are some similar frameworks which have similar approaches which also inspire the design so if you wonder about the name which nobody can pronounce, memorize and so on I came to the name because during the time I thought about this stuff I visited Slovakia and I went to a concert in this location which is called Klub-Zilka Glow and later I learned that this means behind the mural so it seemed like some kind of very possible name so how does it work? there is some step-through here the redirector first looks at the file you can or should be redirected at all then it cleans up the file name because usually there are lots of sim-links in the trees and it will end up with lots of double files in the database so we can utilize the file name and just ignore some links then the redirector looks up country and continent of the client then it looks up possible murals for that client and sort them by distance then it also makes note of previously used murals by that client so it memorizes the kind of association between previously used murals and after the local mural and on the mural from the country and the preferred mural it will next look at the murals from the same region and then at the rest of the world so it sorts them and after that from the best available murals it chooses one by random this random choice is also influenced by some kind of value that we define from each mural which gives them greater or lesser chances in case then since we tried away the redirector or it may serve the file directly if no mural was found this regime has lots of advantages those redirects are very cheap so it's probably scalable it's very integrated with the web services because it's transparent it gives you control when to redirect and when not so this is the maximum control that you can have or want to have about how it starts to serve it proved to be very scalable during last year when we released the last product we had no problems at all so it was so solid that I could actually go with mutation and no full rank other advantages are that the centralized approach gives us the opportunity to count downloads and it's also possible to integrate a real content delivery network some kind of a wirecard mural which is always added into the mix which we also did with the kind of free release and then there's something special we can serve live muralists instead of redirecting so that's the kind of return muralists I talked about so client would actually get a list of murals and then figure out what it is so all this makes small and partial murals useful for us so in fact the time that we need large murals, powerful murals are actually nearly over because we won't find a lot of murals that are extremely huge trees that we often so what we actually need what we can have is lots of murals which are not low we want terabyte maybe 60 gigabyte is enough so if we could just mural the most popular 10% of content then we are fine and we can have more murals and everybody's happy and one of the best advantages of the thing is that it's not hopeless specifically so it can be used with our download services disadvantages are that murals die without warning murals break all the time it's just natural and the reliability of the entire frame works only as good as it has so what we do is we monitor murals as closely as possible and if one mural fails then we disable it automatically in the database and stop redirecting to it and check back later but there's a certain time window when we detect the failure and actually disable the mural because we can't check it all the time and so there's also always weird failures which are very hard to detect or find out like firework problems that you only can find out about after two or three weeks of debugging with things that sometimes don't work but they work most of the time and here's some interesting potential for the hopensuzut download client because it could make use of the muralists and actually fall back to other murals it's also important to know that with this kind of infrastructure every request run through the central server so this server needs to be up and so it needs to be high ability high ability setup with proper load balancing and so you can set out several of the redirectors that's not a problem it's mostly, at least for us, more problem of the budget because we still have only a senior machine for that but I hope that will change in the future so it's not a design problem of the redirector, it's just a budget problem of the hardware we have but other than throwing hardware at it for us it's also possible to use it and make it use murals so this is my call to the hopensuzut folks I hope we can implement that in the future down times in general are often acceptable for healing users if they don't hurt too frequently but they are often very bad for machines because software acts up and users don't want to deal with error messages so we can, I think we really should make sure that our specialized client has special support for dealing with failures and yeah, falling back to murals let's get through a bit quickly here because I actually already left that and I want to save some time I could expand on the number of optimizations that we did during the process of setting this up so it was some database optimization it's crucial that it's scalable enough and other optimizations are that we try to deal or try to respect certain mural needs or special needs for remote regions like New Zealand that have special internet connectivity so special support for all of this to give you some numbers we directly opened this org which served our last product release in the autumn of last year so I indeed about per second deliver content to the client and that was only the content that we redirected to the network and what we redirected to murals is not new content here and as you see we have several million requests each day I probably not to discuss the other approaches to detail now because time is running out there are some other things like split, it would work well but require a split which won't happen probably really content delivery networks work by adding intelligence to standard DNS we have the static bureau this is not more feasible there is a similar module like modular but it requires interaction with murals the modular project has a so called bouncer which is basically a similar approach and the Fedora approach is also very similar and the difference that they have logic on the client and on the server side so that works together we like so what we have with the current set of R that all fees are a bit and we need to think about splitting them up more and at least offer more fine-grained well thought out R's importance to mirror from like the 10% most popular stuff I mentioned and we need to make sure that we don't just put more stuff into the trees and it's blindly mirror and just fills that disk we need to improve our infrastructure I mentioned it already we need to do better monitoring and we have still some old infrastructure left like the mirror lists that were maintained in the old system which were pretty static and edited by people they are not similar to the system really there are certain ideas which I will skip now if you talk about it thank you for listening ask me if there are any questions what do you currently determine if the mirror is dead what is your criteria for declaring it is dead we check that every 3 minutes we are actually being pressed on the base URL so we would like to improve on this by checking real files and like random file downloads, things like that and we need to expand on checking for large file capabilities and things like that so there is a lot of potential did I get it correctly that you also have some algorithms in place that do kind of load balancing that you don't always redirect one potential mirror if this one hits you yes, so that is a weighted randomization that we do and we give the mirrors weight individually so by having to make sure that a small mirror only gets very few requests but it is still useful and a small mirror might get more requests for files that only this mirror has so we can play together we also have related to that we also have the way of making sure that a mirror in say Israel only gets request from Israel or a mirror in Australia is a preferred mirror for clients in New Zealand so because they mostly go to Australia so it is not helpful to do that but that is the knowledge that you have to enter manually that is probably possible to obtain some things might be possible to find out by client feedback client would give us feedback about mirrors that work well for him so there is also potential yes maybe just three times or if you do trace from the play how many hops and feed this back what kind of connection are there any plans for the party repositories like that good question I think I haven't thought about it we can talk about that to the build service folks um it might be problematic because of files that we may not be allowed to believe but for other stuff but right now implementation expects the files to be there so that helps us a lot for determining if the file is if it is useful to re-elect it if it is fresh if it can be cash and stuff like that so it might be difficult basically it is packed with people so otherwise Peter is of course also a taboo so so thank you very much