 This is intended to be a, it's a slight sort of change of gear from most of the talks that are technical in nature. But I thought that it would be a, that DEBCONF would be a really good opportunity to actually show you guys how we are using Debian for real to do cutting-edge scientific research, which hopefully benefits millions of people across the world. There is some technical stuff here a little bit, but there's also some general description of what the Sanger Institute is, what it does, and generally the sort of things we're trying to do. So before I, I wasn't quite sure how to target this. So one thing I'd like, before we start, is a sort of show of hands, if you can, how many of you are sort of vaguely involved in bioscience, bioinformatics, medical research, that sort of thing. Okay, so we do have a few, but the majority are not. So, okay, I'm gonna have to, please, those who are, please excuse me if I have to explain any sort of stuff that feels like teaching my grandmother to suck eggs for you guys. So the two of us that are giving the presentation, so we're both, it's myself, Tim Cutts, and this is Simon Kelly. We're both members of the Sanger Institute's Informatics Systems Group. We're a sort of internal consultancy group, if you like, within the institute. Our job is to provide a sort of interface between the regular systems guys and the scientists. That's why I have Doctor in front of my name. I'm actually originally a biological scientist by training, you know, geeking is just my hobby. So, anyway, Simon and I are both Debian developers as well, so it's a sort of quite good fortune that we're, you know, it gives us a good opportunity to be here. Anyway, so here's how the talk's basically going to work. If my, okay, so I'm gonna start with an introduction to the Sanger Institute. Then a little bit about sort of why we chose to use Debian and exactly where we use it because we don't use it sort of for absolutely everything, but, you know, we do use it in a wide range. I'm going to talk about Ensemble, which is the Sanger Institute's sort of flagship data product that we give out and it's a sort of interesting example of the sort of high-performance computing work that we do and that is built entirely out on Debian. For those of you who are hardware geeks, you'll get a nice picture. There's some pretty pictures of some nice hardware porn for you to look at. And then Simon will talk about the Luster open-source cluster file system that we use. In fact, I'm gonna steal this thunder here a bit, but we don't actually use the open-source version of Luster. We use HP's commercially supported version of Luster, but it is still an open-source product underneath. And it's an interesting little story of how we sort of shoehorned that to make it work with Debian. And then I'm going to talk a bit about, you know, we're running several hundred nodes on Debian and so I'm going to talk a bit about sort of how we install this stuff and how we configure and manage it. So about the Wellcome Trust Sanger Institute, which from now on we'll probably refer to mostly as WUTSI because it's easier to say. So it's funded by the Wellcome Trust, a big charitable organization, and its original purpose was to participate in the public version of the Human Genome Project. It's now diversifying into sequencing a large number of other genomes of disease organisms and other mammals and other vertebrates. I think we do about, Ensembl now is about 25 genomes, something like that, that scientists can download and compare and so on. And a large part of this is once you've got an animal's DNA sequence is what we call annotating it, so actually sort of marking on it what it actually does. There's a bunch of disease research projects going on, particularly into diseases which affect large portions of the population, especially in the developing world, so we've got research projects on plague, malaria, and diseases of that nature. And the thing that should appeal to everybody who's involved in Debian because it's a similar sort of philosophy is that everything we do is given away for free for people to do what they like with it. So this is where we work, the Wellcome Trust Genome Campus. And part of the reason why I'm showing you what a nice and green and beautiful place it is is that we're recruiting. So if you like the look of the stuff that we described today, you can come and talk to us afterwards because we need a couple more cis-admins. So there's the Sulston Building, which contains labs and offices and all that sort of stuff. The West Pavilion, that little room at the end there, which is where all of the DNA sequences live for the production sequencing operation, which goes on 24 hours a day, seven days a week. So this isn't just an academic research institute. There's a certain degree of almost an industrial scale work going on here. That's our brand new data center, 1,000 square meters of machine room. Very nice little piece of building there. The offices that Simon and I live in are just in front of it. We share the site with the European Bioinformatics Institute, those of you who are other biological people will know about. We have a nice restaurant, Jim, so you can work off the food you've just eaten. And there's even a nature reserve. The Wellcome Trust are sort of very keen to show off their green credentials, so a lot of the land that they use has been maintained as a nature reserve. And there's also a conference center. Most important, if we change the view slightly, is there's a nice pub just nearby, which is within staggering distance, so that's great. So what we initially did, the Human Genome Project, it's allegedly finished, this phrase completed in April 2003. Well, the Human Genome is one of these things where you say, well, what do you mean by finished? And actually it's continually being revised even today. But the Sanger Institute was the biggest single contributor to the Public Human Genome Project. I'm sort of flag waving a bit there, but actually if you toss up the amount the American labs did, they did more than us. But in terms of a single institute, we've done more of it than anybody else. And as you can see, that involved generating and storing a fair bit of raw sequencing data, which the scientists currently are insisting on keeping, because they go, well, we never know when we might need it again. And this is a problem that we're about to get in spades. I'm just going to sort of skip past the next one, but is that even remotely readable to you guys down there? It probably isn't. It's just a chart of the various sorts of work that we're doing as well as the sequencing, but how is my laser work? Hey, does it work? Only just. So I've sort of color coded these things. I apologize to people who have red-green color blindness. I've just realized this is going to look awful. But anyway, the square boxes up here are the sort of laboratory techniques. So we've got things like the regular genome sequencing. We've got types of sequencing, which are looking for variations in human DNA so we can hopefully develop methods for things like targeted drugs and so on. We've got microarray expression data. We've got some automated histology, so looking at exactly how certain genes and proteins are expressed. So these are general mechanisms for generating data, which is why they're in this sort of square box. And then we have huge amounts of automated processing that we would then want to do on these things. And this is why, of course, we end up needing vast numbers of computers. And eventually all of this stuff goes into databases, which is used for the disease research, but is also slapped out onto the internet for you to download and read if you feel so inclined. So where do we use Debbie in all this stuff? Well, the first place we use it is actually on the desktop. So a lot of DNA sequencing software and DNA assembly software was originally written on UNIX systems. In fact, at Sanger it was mainly originally written on True64 because at the time they were starting out on alphas, Linux's support for large files was not complete. And we knew that the human genome is three billion base pairs long. So if you want to represent that as a single file, you need something that supports 64-bit file lengths. So the natural assumption 15 years ago was, well, we need to use a 64-bit architecture, so we went for alpha, which actually leads some interesting code porting problems when we've gone to Linux subsequently because, of course, we initially went to 32-bit Linux and everybody's familiar with interesting porting problems of taking 32-bit stuff to 64, but you actually get the reverse as well. You suddenly find that somebody's assumed that long is always eight bytes and anyway. So we've got 300 desktops that are running Linux to run some of the interactive software that people need. We've currently got 583 high-performance computing cluster nodes, which have between two and four processors each. It comes to a total of currently about 1,600 processors. The website that Sanger produces obviously gets an awful lot of attention from the world in general, so we've actually got over 100 machines that operate the website in one way, shape or form. From the point of view of sort of industrial-scale stuff, our production DNA sequencing has in the last six months been moved on to Debian. It was on True64, but it's now entirely run on Debian systems, which is quite nice. And we've also got sort of dozens of odd little servers here and there, which are doing just random bits of rubbish that people want to do like MySQL. Sorry, I may be giving away my personal feelings about MySQL there, but anyway. But where don't we use it? Because it's important that you understand there are places that we don't use it. So we still have quite a lot of legacy True64 stuff around. And obviously HP don't sell that stuff anymore, and they don't really support it much either, so that's all going to go away. And it is all being replaced by Debian clusters on x86 of various descriptions. We've got around, I think it's about 20, it might be more Oracle servers. And of course, we've got the usual Oracle support matrix problem. And you can't, you know, if you have a problem with Oracle and you call them up and you say, but I'm running it on Debian, they'll go, oh, thank you very much, bye-bye. So we have to run that stuff on SUSE, which is a bit, you know, it's okay. It presents us with a little bit of an administration complexity. We also have a couple of SGI alt-x's for the bits of code where the users either are too lazy to or can't work out how to run it on a distributed computing farm. And so we've got a couple of those which are still running SGI's version of Linux, but as etch now supports alt-x's properly, they will get etch as soon as I can actually persuade the users to give me any time to shut them down and upgrade them. And I definitely do consider putting Debian on them to be an upgrade. And we have about 500 Windows desktops and associated servers, which we're just never going to get rid of. You know, they're just necessary for one reason or another, although I just sort of forget about that. Nothing to do with me. So the next bit is why did we choose Debian? And there are some sort of non-technical reasons why we chose Debian in particular. Well, Linux ended then Debian in particular. First of all, when HP dropped alphas in True64, we got burned quite badly by that, especially with regard to their file system. We used the ADVFS file system on True64, which is quite a nice file system. And of course, HP came to us and they said, well, you know, your upgrade path, of course, is to go to HP UX on itanium, and we will give you ADVFS and all this fun true cluster stuff that you've been used to on True64 on HP UX. And we sort of went, not quite sure that's going to happen. And of course, it turned out that we were absolutely right and that HP have now dropped any plans to put those on HP UX. So the fact that we said, no, we're not going to do that. We're going to go to Linux instead is quite fortuitous. And we want to avoid that happening again. And that's one of the reasons why we chose Debian, because the more commercial flavors of Linux, you know, we could always get burned again. If they decide that they're going to change something or they're going to, well, like Red Hat changed their licensing and all this sort of thing, sort of, I'm not interested. If I use Debian, I'm quite safe from that sort of stuff. Of course, it's completely vendor agnostic, which is great. It means that we can buy our tin from absolutely anybody and of course our head of IT likes that because he can bargain for, you know, whatever good deal he can get from whoever the vendor today is. I mean, we don't absolutely go for the cheapest hardware all the time because that way lies disaster. But, you know, it does give us more flexibility. And as I sort of suggested earlier, the open source philosophy that Debian has, that everything has to be properly free, meshes pretty well with the way the Welcome Trust wants to work. I mean, the whole reason that the Welcome Trust got involved in the Human Genome Project in the first place was that there were companies like Solera and Insight, which for my sins I used to work for, who were trying to sequence the Human Genome and patent it for money. And, you know, there was a strong feeling that the Human Genome belongs to the human race and should not be patentable. I mean, it was questionable whether that would ever stand up in a court anyway, but, nevertheless, the Human Genome Project was set up to start a race at which we won to get that data into the public domain. And another technical reason, really, is that there were three Debian developers in the systems team at Sanger already, Simon, myself, and Dave Holland, and we just said, well, we're familiar with this and we like it, so we're going to go with it. And as I suggested earlier, financing change was actually a major, because the first time we went to Linux, it was Red Hat, because that was the sort of the comfortable thing to do for the management. Not for us, but for the management. It's like, yes, we can cope with Red Hat because we've heard of that, but, of course, Debian has increased its perception quite markedly since then. But, of course, there were a lot of technical reasons why we liked Debian. Everything just worked for a start, most of the time, anyway. So large file support was much more complete in Debian than in Red Hat quite early on. I've got fed up with noticing that, for example, I remember at one point, Red Hat's tar didn't understand 64-bit files and neither did GZip even, it was just hopeless. And one of our major wins, of course, is things like the upgrade path with Debian. You can incrementally upgrade it all the time. Whereas Red Hat, you want to go from Red Hat 3 to 4? That's a reinstall, isn't it? And a major thing for the server side of things was this sand multipath stuff. We got that working on Debian years before we managed to get it to work on any flavor of Red Hat. And for the servers, getting proper multipath access to our petabytes of sand storage was a pretty important thing. And of course Ian's package... The stuff that originally started with Ian's package of management, but apt and all the rest of it is just so far ahead of anybody else's, in my opinion, that it's a no-brainer. But another good one that appealed to us was homogeneity across architectures. We are using all of those four architectures in production within Sanger. And we'd sort of noticed... I don't know whether it's still the case, but for a long time Red Hat's Alpha version and their X86 version were not even remotely in sync. Whereas the whole ethos of the way Debian is built, we have a reasonable guarantee that things are going to be the same on all of the architectures. So that's a major plus. And it also gave us an opportunity to see how our apps and our servers were actually running the same operating system, which makes the life of the... Well, most of the time anyway, which makes the life of the programmer much easier because they can just develop their stuff on their desktop and just expect it to work on the servers eventually. And of course Debian's bug tracking system has got us out of all sorts of scrapes numerous times. Who has anybody seen this before? Oh yeah, so they have. The challenges that we actually face. So we have, as I said just now, we have about a petabyte of storage now. And it's growing fast. The biggest increase in our storage requirements at the moment is we're just starting to put into place a new sequencing technology. Now one of these machines, these new machines, produces as much DNA sequence as a hundred of our previous generation of technology. And we're about to buy 20 of these buggers. Each one generates... What's the figure Simon, can you remember about a couple of terabytes a week of raw data? Something like that. Anyway, it's an immense amount of raw data that we've got to find some way of actually doing something with. And of course, as I said at the beginning, the scientists actually like to keep this stuff forever. To which we're actually, I think we finally said, yeah, this is enough. You cannot keep this stuff forever anymore. The storage will just cost way too much money. And just how long is it going to take? If we do have a... God forbid, if we do have a disaster and we need to restore this stuff from back up, how long is it going to take? So, yeah. They also want it all in an Oracle database. So I leave it to you to imagine how interesting it is to... We've already got an Oracle database that is 80 terabytes. I leave it to you to decide how interesting, you know, how difficult... Well, A, how interesting that fact is and B, how difficult it is to do. Another major problem is actually getting sustained I.O. performance. Most of the bioinformatics algorithms that are run commonly on our hardware are pretty I.O. intensive. It's one of the other reasons why whenever people ask me, oh, do you do grid stuff? I sort of swear at the G word because I hate it. And the reason we don't do it generally is that unlike something like SETI at home where your job is a fairly small packet of data and you then churn on it for hours and hours and hours, in bioinformatics and sort of genome research in particular, it tends to be a pretty big chunk of data and the job actually only lasts a few seconds. So the ratio is completely wrong for sort of sending data off over a wide area network to bring a result back. So we've got this problem that our high performance compute cluster has to be able to sustain very high amounts of aggregate I.O. And of course we've got interesting administration problems. I've now got almost in total, I think it's almost 2,000, but certainly by the time I install the new blades, which are next week, I will have almost 2,000 Debian boxes to run. And at the moment, the ISG group is four people. There are about 24 CIS admins on the site in total, but more than half of those are dedicated to Windows because of course it requires a lot more admin time than anything else. So let's talk about Ensembl. This is where the biologists who are here can go to sleep because they've probably seen this any number of times, but this is the product that Debian is, that you guys indirectly are collectively creating for the scientific research community. So what you can see here is this is a high level view of a portion of the human genome. So you can see this tiny little red window here. You might be able to see it, which is what expanded into this panel down here. So this is a whole of human chromosome one, and then you can see on annotated down here are certain genes. And if you scroll down this page, you get things like this. So what you can see here is some more detail of the human genome data. So each of these tracks is a particular source of experimental data. So for example, these are known human proteins here, other proteins from other organisms. We've got some expressed DNA sequences here. And down here are some predictions from programs which just start with raw DNA and say, does this look like a gene or not? And then what Ensembl's compute infrastructure does is take all of this raw evidence and condenses it into some predictions of what it really thinks the human genes are on this piece of sequence. And as you drill down into this with more successively more detail, you get information like exactly how the human genome sequence was built up from the raw sequencing reads. There's things called SNPs here, single nucleotide polymorphisms. This is places where the human genome is varying between individuals and is therefore important for scientists to be able to identify places where there might be a significant change in a disease. These are coded, they're color coded, and it says actually down here which you can't read, but for example, a SNP that's coded in red is one that actually occurs within a coding gene up here somewhere. And it will actually sort of, and so it gives you some ideas of which are the most likely targets straight away. And you can keep going for even more detail. And here we're right down to the base pair level now, so ACs, Gs and Ts are up here, and predicted protein translation of the DNA. And down here you have what are known as restriction enzyme sites. So these are sort of molecular scissors that if you as a lab scientist want to cut out this gene and do some experimentation with it, this information down here tells you which particular set of molecular scissors you need to use to get at it. And finally, there's a system called the distributed annotation system, which doesn't have an associated diagram with it here. And what DAAS is, is there's a, it's an ability for you as a scientist, if you have your own data set which you want to layer over this, you can set up what's known as a DAAS server. And it's some sort of, I can't remember exactly how it works, it's some sort of XML based soap type thing, which allows you to layer your own annotation of what you think the genome looks like over the top of this. So here's an example of how that database stuff is actually built. So we start with a database of human genomic sequence, raw DNA, and we run some basic analysis on it, mostly using the program BLAST, which everybody who's done the bioinformatics will know about, but for the rest of you who don't know what it is, think of it as like, it's sort of fuzzy grep. You know, it effectively does regular expressions for DNA, but it has some knowledge, or land protein, and it has some knowledge about the biology, so it'll be able to go, yeah, those look sufficiently similar that I can say that they're either related to each other or they're... So you have, and each of those boxes that you saw on the ensemble diagram there, each of those is effectively the result of a BLAST job. So that gives you what they call feature annotations, and then they take some expressed sequences, you don't need to know what this really is, and they then mash all this together in what they call the gene build pipeline, which produces a set of gene structure predictions. So this particular compute pipeline, this one here that they call the raw compute, is very IO and Intensive. So here's an example. So 10 query sequences against an 800 megabytes EST dataset. This was done on a... This is quite an old slide. I did this on an Alpha Server ES45 running True64. So this is on a very fast disk system, but it's still only getting 54% CPU time. It's IO bound. So these things require a lot of IO, and they're doing it all the time. It's not as though it sort of reads the dataset and then churns away. It's pretty much continuous. And that entire job took, what is it, a couple of minutes. This particular pipeline, the gene build isn't quite so bad. It uses rather more compute intensive algorithms, and so it scales rather better. But they are both classically parallel, or what we call embarrassingly parallel problems. You've just got 100,000 query sequences that you want to compare against the whole genome, and they're all independent of each other. So this is what the gene build pipeline looks like. I'm not going to go into any detail with this. If any of you are vaguely interested, I can have a go at bullshitting my way through what it actually does. But so each of these rectangles is a complete set of jobs in itself. And then there's an example of how many jobs this actually takes. I did this quite a long time ago, but so when we were on NCBI release 33 of the human genome, this pipeline consisted of 13,500 jobs. And if we just run it in series on a single, one of our compute nodes at the time, which was an 800 megahertz Pentium III, it takes about one CPU year to do the entire calculation. Now, one thing I didn't point out is that all of that ensemble website is regenerated from scratch every two months for all 25 genomes that we do using all the latest evidence, which gives you some idea of why we need the amount of compute we need. So here we go, some hardware porn for you to all drool over. So this was our very first cluster running True64 because we hadn't quite made the leap to Linux at this point. And it's 460 alpha server DS10Ls. Now, in order to get that sort of aggregate IO bandwidth that we need for BLAST to go quickly, what we had to do was put a 40 gigabyte disk in every single node, and then the sort of static data sets that we needed to search were just replicated onto every node, so it's searching it on its local disk. That's a bit of a pain in the backside, especially when you're doing it over 100 megabit ethernet. And it got worse because my first job, this is when I joined the Sangra Institute and my first job was right. We've got, the head of IT said to me, right, we've got two, at the time we didn't have that nice big data center, and they said we've got two 19 inch racks and we want you to fit as much compute into two 19 inch racks as you can squeeze in. And this is what we did. So first of all, we cheated and we bought 54 UI racks. And we bought 768 RLX blade servers. So each of these, so you can see there are 24 of them in each 3U chassis here. And each one of them is an 800 megahertz mobile Pentium 3 with a gigabyte of memory and 240 gig hard drives. Now this unsurprisingly has some reliability issues. There are one and a half thousand drives in that two sets of two racks, which means that we have to copy this by this time sort of 70 gigabyte data set across and of course because it was a 70 gigabyte data set, we're using a RAID zero stripe as well. So that makes the reliability even worse. Plus other interesting problems. We've now got over a thousand nodes in the farm. A thousand nodes trying to connect to a MySQL server will destroy it, especially if they're trying to pull reasonable amounts of data. So this was getting a bit unmanageable. So we thought, right, we need to buy sort of chunkier, what we sort of called fat blades. So the next generation, we bought 168 of these IBM HS20 blades. So each one of these is a dual gigahertz, dual 2.8 gigahertz Pentium 4Z on four gigabytes of memory, which allows us at last to be able to cash the entire human genome in RAM. That makes things a lot faster. And having a dual gigi unit work helps as well. And we were starting to sort of experiment. You can see in the middle of this set of three cabinets we were sort of experimenting with some SATA RAID arrays to see if we could avoid having lots of individual spindles. It turned out at this stage that the SATA RAID arrays didn't really work. Their performance didn't scale very well. So generations four and five. So we bought yet another 280 of those same blades as last time. Well similar, the latest version of them, 3.2 gigahertz now. Each of these things run hot. If you stand behind this rack, you know about it. But this time they're 64-bit blades. So we're running the 64-bit version of Debian on them. And then we bought, we didn't buy two 2 gigahertz Opturons, we bought 200 of them. That's what happens when you prepare the slide the day of the tour. And we finally came up, started using Luster and this is what gave us a final solution to this distribution of this enormous data set to all of these nodes. So we at last have this global file system across all of the nodes where this static data set lives. So no more data pushing with all of its time wasting and general flakiness, because it was a hideous Perl script around R-Sync that I'd written. It's not a pretty piece of code. And the theory behind Luster, which Simon is going to start talking about in a moment, is that it's inherently almost infinitely scalable. As you require more bandwidth, you can just add more nodes to it. Oh, I did say. And then we bought yet another lot, 140 more. So now I hand over to Simon to talk about the Luster-y bit and then he'll hand over back to me once he's done. Thank you, Tim. So, as Tim has very ably explained to you, our problems with storage and we wanted to get away from this problem where we had a big chunk of essentially static data that we were replicating across lots of machines in our compute farm. And we also had a problem that we were using NFS to provide a consistent storage for working data and scratch space and writable storage, which was permanently on the edge of breakdown because NFS server can't really cope with four or 500 clients all hitting it at the same time. So we looked around all of the sort of clusterfile systems which were available at the time, which was two, three years ago. And at that time, it was only just becoming possible to do this stuff and there were various file systems that were available, most of which were proprietary and some of which were available as free software. We eventually came across Luster, which at the time it was using the ghost-grip model where old versions were GPL and the current version was proprietary. It's now GPL for the current version. And it has the great advantage that as far as data access is concerned, it's paralysable to the nth degree because the architecture is that the metadata for the files exists on one set of storage and one set of servers. So that's directory entries and data about block allocation and all of that stuff. But the actual data is smeared across as many what are called object storage targets which are basically machines, network machines with disk behind them and the storage for the files is smeared across as many of those machines as you want more or less. And this works in two ways. You can either distribute whole files, you can either work in a method where files are distributed between these targets so any individual file is only on one target. That tends to work best when you have large numbers of relatively small files. If you have big files or files which are being accessed from a lot of different places, especially if you're read only at once, you can actually take a file and stripe it across all of the object storage targets. So the first megabyte of a file is on the first object storage target and then the next megabyte on the next. So the actual network bandwidth is multiplied by the number of storage targets that you have. So we decided that we would go with the Luster file system and then we had another interesting question which was should we use the free software version of the Luster file system or should we take something called SFS which is Hewlett-Packard's productized version of Luster. In the end we decided to do that which was a very interesting and difficult trade-off on the positive side and in fact the reason that we did it in the end is that HP have taken Luster and implemented it on their hardware in such a way that you can build this whole cluster in a very resilient manner. You can build one of these clusters with an active, passive pair of metadata servers so if one of these things fails, the other one will take over completely transparently, basically no eye operations will fail. And if any of the object storage servers fail, each of these, it's not indicated on this diagram, but each of the disks at the back end of here is actually dual ported to two of these object storage servers so that if one of them fails then the other one in the pair will take over access to that data. None of that was available in the GPL version of Luster without an awful lot of implementation effort which we frankly didn't have the resources to do whereas HP took this stuff and put it on their hardware and you can buy the whole lot in a cluster and if you ask HP really nicely and pay them enough money, you can ship this thing into your data center in a rack all configured and cabled up and everything for this stuff. So in the end we decided that we would take the HP SFS proprietary version of Luster rather than the free software version of the same thing which gave us all of those big advantages but unfortunately we then wanted to run it on Demian and HP were only providing support for Sousa and Red Hat. So here's a diagram of the kind of network arrangement that we've got for connecting these things together and you can see here there's the servers and we have a core of 10 gigabit Ethernet and then there's a couple of different gigabit Ethernet ports which connect sorry I've got this back to front here's the servers down at the bottom we have 10 gigabit in the middle and then lots of clients here and I have some graphs of what we've seen in real life so you can push at the top here and the 10 gig core we've been seeing consistent data rates out of our storage service of sort of two to three gigabits when pushed hard and the bottleneck on this stuff is the number of disks and the number of object storage servers that we currently have and we think we can probably well the current plan is to double the number of disks behind this stuff and double the amount of storage so we finally solved the problem that we can't access all of this data fast enough for the number of compute nodes that we have doing this stuff okay this stuff isn't perfect there are problems with it because metadata operations all go through a single metadata machine they don't scale well and in fact it's worse than that because at least some metadata operations including creating files also need to touch the object storage servers which are handling the files so there's a big there's a big win in striping individual files over multiple object storage servers unless you really need to yeah random access is slow on these things for very similar reason especially for files which are striped over multiple servers so for much of what we're doing that's not a problem because this last searching that Tim was talking about basically does streaming access to files so that works very fast and very well but you can't it's difficult to do DBM files to implement databases over this stuff for instance so we're still using NFS for some things not all things but we've replaced most of NFS at least on our compute farm with Luster so this comes back so the problem we have with HP SFS is that because we've committed to using SFS rather than the open source stuff we're now third hand in taking software so the GPL version of the software is released we take it and do stuff with it and they're doing development and quality control on it and then they pass it to us and then because we want to run on Debian we have to take their client kits and their client code and adapt it to Debian so there's a kind of a three stage cascade really of new releases of the software which means that we're running quite old software and at the moment we can't run anything later than a 2.65 kernel which means that we're going to have problems moving to Edge on this stuff until we've waited for this cascade to come through so I mentioned much of this stuff already because we chose proprietary because of the integration with HP hardware but we have to do Debian support for ourselves and the other problem is because HP have taken GPL Luster and they've made modifications to it. HP have contracts with the US government to run Luster on some of the big Department of Defense clusters so they do a lot of tuning, a lot of changes to this stuff so that we can't even share the effort that we're making Debianizing this stuff with other people who are using GPL Luster because what we're using is different from GPL Luster by the time it's been filtered to HP. So the first part of the Sanger's infrastructure that actually went on to Debian was actually the desktops well onto Linux in general and that was actually done by a separate team from us it was done by what we call our operations group so for a while we had completely separate installation and management infrastructure for our desktops and our servers excuse me so for example the desktops were using a fairly standard Debian preceding install and then using CF Engine to actually manage the manage the configuration and we in ISG were using, well initially we were using vendor specific tools so when we had the RLX stuff we were using their control tower product which was just a, it's just basically it runs a it pixie boots the machine and I'm unpack, formats the disk and unpacks a tar ball on it it's an image sort of distribution but more recently especially with once of course we bought hard, we had different vendors hardware we had to use something that didn't care about which vendor we were using so that's why we started using FAI but we now unified it this is what I've sort of been spending most of the last couple of months on so we now use Faye to install everything desktops and servers five minutes for sure well this is pretty much the end and we also now use CF Engine to configure everything well not true 64 but every Linux flavor is configured with CF Engine and we also have a local APC repository for certain customized packages that we want to do so what have we done with Faye well we've now got a Faye whatever the stable version of Faye is in etch 3.1.8 isn't it and we're installing both 64 bit and 32 bit clients off the same server which is nice we had to do a few interesting little tricks because of the way our network guys insist that we deploy desktops so that a machine always has the same host name regardless of which bit of the network we plug it into so we're using dynamic DNS but the machine has to have a fixed host name so I've been modifying Faye so that we can actually store the name of the machine as the asset tag in the desktops BIOS and then we grab that during the Faye install and use that to define the name but we stick with the DHCP configured IP address so that works quite nicely but one of the reasons that we chose Faye and we like it so much is because it's so easy to just get in there and do stuff with it and especially if you're trying a new piece of hardware you've never done before and it all falls over and you get in there with SSH and prod about and find out why it's gone wrong so that's one of the reasons why we've gone to Faye our sort of principle is that we actually although Faye has some very strong stuff for actually configuring the machine we don't actually use that generally the way we're using Faye is to basically get the machine to its bare minimum usable state so its network is working, its host name is configured and then we hand the whole lot of the rest of it including installation of most packages to CF Engine so that later on we don't use Faye's soft update method for updating things afterwards all package management subsequently is done with CF Engine typical compute install time with Faye just I like showing this to Windows administrators because they just can't believe this that it takes less than five minutes to install a server and they just the desktops take 30 minutes and I reckon that's almost entirely Noam's fault but there we go so we use and we use CF Engine as I said for configuration management across the across the board I'm starting to have itchy feet about CF Engine I'm not wild about I'm quite interested in talking to anybody who's tried puppet as a replacement for CF Engine or there's some shaking of heads it just looked interesting and as I say we don't use when we do do package management with CF Engine perversely even though I said we don't use Faye to do the package management I actually stole Faye's idea for doing the package management so we just got a list of package names that we can just throw at aptitude and say there go and do it plus our Faye server and our CF Engine server are on the same piece of hardware which makes an enormous difference in having the whole lot of our configuration information it's all in one place it's all under the same CVS control yep and local repository going as quickly as I can now because my time is up so we've got it we're using an app repository that's made with dev archiver which allows us to sort of shove all sorts of nice things in it so the sort of things we're doing are custom kernels, luster client kit packages I had to patch CF Engine to a bit so my own patched version of that and some clustering software special versions of heartbeat a cluster aware version of cron things like that what are we doing in the future going as quickly as possible so we're about to move the desktop to etch server migration is slower as you've been told it's because luster support for etch is going to be a problem we're going to replace all the remaining 264 stuff with Debian and have a look at parallel NFS version 4 I'd be very interested in talking to anybody who's tried that and a sort of oops that was my Debian wishlist which you now can't see which I've actually don't really need to show you because I've already talked to Ian about this we really like a sort of version of deep package that we could use for allowing programmers to install their own packages without needing root access but still hopefully being able to depend on certain things that are installed on the system itself so anyway hi I've raced through the end but I've finished I've got any questions or are you all now asleep there's one there I can't patch deep package but I'm running this Debian made stuff can we do anything for you perhaps can you use this stuff of Debian made sorry what's it called Debian made this medical distribution which also cares for biologists okay so the packages I'm talking about here are not actually the standard things like Rust Hammer and ClustleW it's locally written software there are about 100 or so programmers working on the development of the genome processing stuff and they've got their own bits of software that have all sorts of complex interdependencies and we really want a deep package like dependency tracking so that if they upgrade something they know that we can't do that until that's been upgraded as well that's what they're after do you have anything like that yeah but the problem is can we do anything this admin deal or what you described here is a little bit deeper than we can provide regarding packaging is there any because you said to use some software which is not in Debian can we do some packaging which would make your work easier or something like that I well it's worth us talking about it offline I think okay yeah I think we'll do it later so I'm Luca Capello biologist by profession just curiosity what's the situation with the other genome sequencing centers they use Debian they use whatever else well most of them used to use True64 like we did I don't know what they're switching to now actually well Celera went to IBM and power on AIX we're being told zero again so but I don't actually know what they do we have protocols for exchanging the data but of course from that I can't because that big AC terabyte oracle interface has all of their data as well as ours but I don't know how they generate it