 Welcome to another edition of RCE. I'm your host Brock Palin and I have with me again Good co-host Jeff Squire from Cisco Systems and the open MPI project Jeff. Thanks again Good afternoon Brock. How's it going? Pretty good. We have today a actually highly requested topic It took us a little while to get them on but Surely they are here we have with us Andreas from luster The parallel file system. So Andreas welcome to the show and tell us a little bit about yourself Hi, I'm Andreas Dilger. I've been working on the luster for about seven years now it's basically Since the beginning and I've been previously involved with with ext3 and I'm one of the ext4 maintainers still and been doing file system development for a Large part of my career. I guess I guess the other thing to mention I work for Sun Microsystems and I'm currently One of the architects of luster Okay, so could you give us a quick summary for someone on what luster is and what its target audience is? So luster is Distributed file system that's used quite a lot in high-performance computing It's essentially a Client a clustered client server file system that allows the storage capacity and the performance of a single file system to be scaled by adding additional server nodes and it the main the main driver for luster is is really scale and We run on usually seven of the top ten systems in the world and You know really the target audience is people that can't Meet their their file system needs, you know with a single, you know large file system server Right, so if you can do what you want with, you know a big NFS filer or a big You know NFS server from EMC or somebody like that, you know, that's Probably a good solution and then luster takes off from there when you need You know two or ten or a hundred times the bandwidth and capacity that you can get From, you know one physical server You've mentioned EMC like NFS appliances Luster is it an appliance does it have hardware and you buy the hardware and the software's all integrated Or is it just software that you run on top of white box hardware or? It's it's a it's a little bit of both. I mean luster itself is Is just software like NFS right? It's it's largely You know protocol and implementation It's not standardized like NFS but you know, we we run luster on a wide variety of hardware and It can use basically any back-end disk storage that appears to Linux as a You know device so it can be hardware raid. It could be software raid You know fiber channel scuzzy SATA disks whatever and the The So, you know the software that that we You know, we don't really sell software. It's off open source, but we make luster available You know to whoever wants to use it and then Depending on Your requirements people use it as you know an open source product and they install it themselves and manage it themselves On white box hardware you can get it through resellers you know HP and Cray and Dell and You know DDN a number of different partners that that take luster and then they You know configure it and test it on their hardware and then they provide, you know first-level support and Some of the some of the vendors Are looking more towards appliance right HP in particular they sell They call it's SFS, which is their scalable file store. They essentially sell luster appliances and You know most of the other vendors they sell it as part of a Complete package that includes, you know, Cray includes luster on their XT3 XT4 XT5 systems as the base file system Okay Spreads the whole range You mentioned You know luster targets scale that gets a little confusing sometimes in disk systems You talk in large files large total file system size high Bandwidth high IOPS for a large number of small files run around what does target does luster actually hit all of those Um, it it actually does. I mean it well for most cases it doesn't necessarily Do the best in the case of large numbers of small files Because that isn't really the the market that has been you know driving the development of luster We definitely target Large total, you know high total throughput Large numbers of large files Um, you know the the the bigger the file system in general the more likely it is to be using luster Once once you get into You know small files and high counts of small files the the the current implementation of the luster protocol Isn't really optimized for that because we we do a lot of of work to Optimize the IO requests for large IOs and to get the maximum bandwidth So that's sort of one of the weaker areas of luster Is small files, but you know people still use it in a A general environment, you know use it for home directories and things where the files aren't necessarily huge So andres, how did luster actually get started? What's some of the history? Um, so actually before I started working on luster um back in 1999 We started doing a project for seagate that was related to object-based disk storage and at that time the intent was um More along the lines of replacing You know sata or fiber channel Essentially, well I think that was probably before fiber channel existed It was essentially replacing sata and skezzy with uh ethernet And you'd have object-based disks and the file system Layer would just talk to uh, you know inodes on the disk And it would handle all of the allocation internally and um So at the time, you know, we worked out a prototype that was based on ext2 You know sort of split in half with with the inode, you know, sort of block storage layer underneath and you know a file system level on top but um that project actually got cancelled and um So there wasn't really interest in you know developing a whole new you know interconnect you know protocol at that time and uh So it it sort of sat on the shelf for a few years And then peter brahm Who I who was you know the head of that project, you know, he was Uh, you know tossing this idea around for a while And you know took it from being a single um You know a single disk kind of file system to being uh You know an aggregate of these object-based disks And so it went from You know a small scale one pc kind of system to The this the level of handling high performance file system io And at the time there was a project um called the ascii path forward That uh was looking for next generation You know storage solutions for their their upcoming supercomputers and Peter brahm got a contract developing um the this The file system for this this um Ascii path forward contract And so we were partnered with intel and hewlett packard to take essentially this prototype You know object-based file system and you know increase it so that It's a client server file system. You know it has cache coherent locking You know it can scale io You know as linearly as possible as you add devices into it And from that point You know we we stopped working on individual disks and because we have to implement The luster protocol and software You know you essentially have a pc now that's sitting in front of the disks and in order to I guess Mitigate the costs of having this pc implementing the protocol We start having big raid arrays Behind it so instead of having individual object-based disks now we have essentially object-based servers And so that was that was really the beginning of of luster We we started on this this it was actually a five-year contract to you know Implement prototypes and progress the design through various stages of development and You know one of our our early development partners was lauren slivermore national laboratory and they actually They they weren't doing any of the development per se, but they helped us a great deal in providing systems for testing And actually a really good development environment because even on their largest clusters they have You know serial consoles and crash dump and everything like that So that you know once you have You know encounter a problem with your system you can actually Debug it in a reasonable manner right without having kernel crash dumps You know the the development of an internal file system, especially a network file system is significantly Um, you know hindered and because of the scale at which we need to test You know this isn't really something that that any individual can do at home on a reasonable scale Right, you really need the the large systems in order to catch race conditions and loading and things like that So our you know our test cluster was essentially a thousand node Um system had lauren slivermore that You know we ran on for quite a few months before we could get a reasonably stable and usable file system So being a network guy you said the magic uh n word in there and and um, I just have to ask what kind of networks Does lustre support? What do you what do you run on and what transports do you use? So lustre actually um is different than a lot of um other network file systems in that Instead of just using tcp networking for the the the Transport and then you know building everything on top of an ip network lustre actually Implements its own networking stack called lnet now and um It it works. It does work on on top of tcp But lustre also uses um native infiniband using the the OFED stack In the past before uh OFED Existed as a standard we used to support um, you know all of the different vendor infiniband stacks from sysco and um All of the you know, it's four or five different infiniband stacks. We also support um the cray internal networking called cstar and uh mirror net and a bunch of different high performance low latency Networks used in supercomputers and so really one of the the um The reasons that lustre can can scale As good as it can and get as good a performance as it can is is due to the You know the implementation of the networking protocol that we have and in that regard we're very fortunate to have You know top notch You know experienced network Initially a networking developer, but now we have a networking team That can really you know squeak the last bits out of each protocol that we sit on top of and lustre Uses rdma on basically all of the networking types except for tcp Which doesn't support rdma But that really makes the the uh protocol much more efficient over the network So let me ask you then i mean so you mentioned one thing already rdma and i assume also heavily influenced there is the You know offload the hardware Um as opposed to being driven by software But what what are the kind of tricks do you do to to get the high performance? I mean you mentioned that you're optimizing for max bandwidth You know what what kind of things do you have to do in order to give these kind of guarantees? um well for for for tcp for instance one of the tricks that we do is um the The inter node connection um at the tcp level actually uses three different sockets right we have a uh One socket that's for low latency small message passing And you know the the tcp options are tuned appropriately for that um Then we also have uh A one socket for large message sends and one socket for large message receives And so that allows us to uh tune each of those connections independently You know things like nagel are Um really bad for for low latency messages because it it deliberately adds delay into a message in order to try and aggregate them together But um, you know that hurts your protocol if you can only send You know 100 or 200 messages per second Right, you're not going to get any better performance than that And uh, you know, we do similar things on the higher performance um network interconnects, you know, they they also have The concept for sending small messages very efficiently without rdma and then the larger messages require rdma. So we try and um, you know pack the the luster protocol messages Into you know the smaller request size and then only when When the message size, you know, if it's a large request that it might have a 4k path name or something It might need to set up a little rdma message but generally we try and keep the You know, you know utilize the most efficient um Messaging type for each each protocol that we support So you have these different things floating around Um, can you give a little bit of overview about what an mds is and mgs is and what an os s is And which one is responsible for what type of performance and what one can you have multiples of? Okay, so there's um There's three types of servers those the the names that you mentioned those are each Different server types. Um an mgs is the management server and um it handles um configuration of luster clients and servers and uh, you know, it holds um essentially Files with individual configuration records in it and when a client first mounts Um, you know, you give it the mgs name And a file system name and then it will connect to the to the mgs and Pull the configuration log, you know, which servers are part of the file system And various configuration parameters, you know the maximum rpc size or You know maximum cache size or whatever tunables there are for the file system It gets out from the mgs um The the second one the mds is the metadata server and uh, it's responsible for um handling all of the The path name lookups and permissions and quotas and things like that that um You know, essentially what most people consider as part of a file system Um, you know, you you do your directory listings and things like that That's all have handled on the the mds and now On on a single mds, which is, you know, the name of the server node. There can be multiple mdt's those are metadata targets and so those are essentially um Individual file systems that are exported From a metadata server and in the current production versions of luster Uh, a metadata server can have multiple metadata targets that it exports But each of those targets is for it has to be for a separate file system, right? So if you're you know, if you have You know, let's say a file system that's scratch a file system that's home You can serve both of those off of a single metadata server But uh in the current releases you can't have You can't scale up essentially your metadata performance by adding multiple metadata targets into a single file system The third type of server that luster has is an object storage server. That's the os s And again, it's a server node that exports Essentially the file data for all of the files in the file system and Each os s node can have one or more os t's which are objects storage targets and those are the individual file systems that you know store the objects that hold the file data and Unlike a mds and mdt's you can you can add Essentially, well as many as practical os s's and os t's to a single file system And that's how you scale your capacity and you scale your bandwidth for luster and so You know typical numbers would be You know at least, you know four or more os t's just because of You know scale, you don't necessarily Want to use luster for the tiniest systems, but it goes up as high as you know 1400 or so is the largest production system today That has 1400 os t's on about 200 os s nodes and You know when once you get That many servers going The clients they connect directly to the to each of the object storage servers And they can pull their data directly from each of the servers without involving You know the metadata server for instance, and so that allows you to scale your bandwidth Um directly as you add more os s's and os t's to the file system But adding more os s's and os t's actually does other things too Because you're adding network ports if that's a bottleneck you're adding cp if that's a bottleneck you're adding the Linux disk cache, you know more memory on every one of these servers floating around So unlike an appliance Where you just keep adding on shelves But you're still limited on however many network ports or How much performance the head has? With luster you just keep tacking on servers and you you add that performance For all those pieces on through Yeah, by all means, um You know, that's one of the the the good things about luster is one of the fundamental design Goals is that each of the the os t's in the file system are completely independent They don't um communicate with each other. They don't need to coordinate anything and um so You know like you said you add in a new os s node You get you know as many um You know network channels as you want We try to design systems when we're the ones doing the the system integration For luster we try and design, you know the os s nodes so that Their network bandwidth, you know is roughly balanced with the i o bandwidth Of the back end disk storage and generally, you know It we try to target That we'll we get about 95 percent of the raw disk bandwidth And 95 percent of the raw network bandwidth um When we're you know setting up systems And you know like also you can add in ram right the larger um The larger The ram is you can cache more metadata and in the 1.8 release of luster You can start doing read caching on the on the server And that's an interesting little tidbit that in days gone by the kernel was actually slowing down the uh Performance of the file system by putting data into cache And uh while it you know, it works fine on your desktop or something like that If you consider that some of the larger luster file systems have You know 10 000 or 20 000 clients connected, you know, you can blow through gigabytes of ram In just a few seconds and so it's not um It's not practical to cache data and reuse it because it just disappears so quickly So until Very recently it was actually slowing down the file system iota even put the data into cache And just with the uh, you know improvements in the newer kernels and the fact that cores, you know cpus are a lot faster We started, you know reintroducing the ability to cache data On the server so in the 1.8 release you can you can do read caching of data So actually this leads straight into a question that I wanted to ask was you know about the whole, uh multi-core Uh crisis or multi-core phenomenon or multi-core boon depending on who you're talking to here You know, how does that how does the fact, you know affect you a file system? You know particularly since intel has finally made the plunge and gone numa and things like that. So, you know, how do these um Types of architectures really play into you know, how you design your algorithms for max bandwidth and max You know quickest response and things like that the fact that you Effectively have multiple layers of ram and you might have big ram You might have you know hundreds of gigabytes of ram effectively for caching But you just talked about how you know sometimes that's not a good thing But sometimes that probably is a good thing and how do you tell the difference and you know, how do you how do you Handle all this kind of stuff Well, so the the the good news is that um, you know luster has been running on high end Systems for quite a long time. So numa and you know multi-core Systems have been you know pretty much the regular hardware that luster runs on And so especially on the server side, which is you know highly multi-threaded It works, you know pretty well We do have numa support For some of the allocations and things like that to try and keep them local to the threads that are running And the threads are you know the service threads are bound to particular cores so that they keep their cache locally You know on the client side, it's quite a bit harder as as we get to multiple cores on the client basically, I think everybody agrees that The individual cores are not going to get any faster And in fact they may start getting relatively slower And so that that is a problem for certain types of workloads because You know, if you have a copy and it's running only one thread, you know, you're sort of bandwidths limited In terms of how fast can you copy data from user space into the kernel? That's actually a significant Um problem these days that as you get up to you know a gigabyte a second or two gigabytes a second You know you're you're totally saturating one cpu just to do the the memory copies from user space to kernel space and so You know in some workloads where you're doing one thread per core. It's it's not a problem and that's pretty typical for hpc but we are We are discussing internally some ways to Multi-thread, you know memory copies from user space even if you only have a single a single process You know doing a system right to call it just there's a certain upper limit on how much you can help before you start getting other problems due to locking and things like that And so if you have big, you know multi megabyte write calls you can still Multi-thread that and you know chunk it up into smaller pieces that get copied in parallel But it's not going to work for every kind of workload um We are still working on improving the The you know locking and things s and p locking As as the number of cores increase You know in recent testing we found that as you get beyond about 16 cores You start to Taper taper off your performance and then past, you know 32 cores The locking contention gets too high in certain parts of the luster ios stack And so that's something that that we're working on now so that by the time those Those systems become more common Hopefully we'll have it addressed So come back up a little bit you mentioned that The largest production system had Over a thousand os ts across 200 os s's Can you give us an idea of? I guess there's a couple different measures of largest here. What's the largest size file system What's the largest bandwidth per second file system out there? They they're both the same one Okay, it's publicly known. They've made a press release. It's oakridge national lab and So they were they have a 10 petabyte file system. It's 10.3 petabytes or something like that And they were in their acceptance testing. They were getting 240 gigabytes a second um for read and write From I think it it it didn't end up being that many clients. I think it was on the order of a few hundred clients um generating that i o but that file system is connected To 26 000 clients in total So it's wow pretty much the biggest in every dimension Yeah pretty cool So How does luster handle like a file system failure? But before we do that She probably is luster just like a networking and server on top of A file system we're all familiar with kind of like nfs or does it actually have its own? file system underneath so one of our early design decisions with luster is that We weren't going to develop our own on disk file system I mean that getting a disk file system to work reliably You know takes five years or more and um, you know, that just wasn't uh An approach that you know, we could we could handle given this, you know, three people in our company when it first started um So we started with ext three in order to um Use the journaling and atomic, you know update Facilities in ext three and over time we You know modified ext three um with different features like extents a better allocator more efficient extended attributes and things like that and so we we've stuck You know largely to the the ext three code and a lot of that that work has been pushed upstream into ext four now but, uh, you know luster is generally like nfs in that regard that it it re exports uh individual file systems and each of those file systems is actually just You know completely coherent local file system that you can mount And you know dig into if you need to And it's really the clients that that do the work of aggregating You know the metadata server and you know files striped across multiple um objects storage servers that You know combine all of that into one file system that you know the user and applications can see and in terms of how luster handles failure fortunately because each of these file systems is completely independent and You know their local disk based file systems that are you know pretty robust themselves It's possible to run, you know e2fs check On each of those file systems in parallel Um, you know or if there's there's problems that occur during runtime Uh luster can can run With um failed storage servers With the metadata servers down You know your file system isn't visible. So it's not really possible to run in that mode But for failed osts it's possible to continue using the file system So let me ask you a question uh from from my own bias. I'm I'm you know an mpi guy here and um All all the mpi's out there have various levels of support for different parallel file systems and whatnot I wonder if you could just give a little bit of explanation of you know, what exactly does it mean For an mpi to tie into a parallel file system because with mpi2 They introduced, you know the concept of parallel reading and writing and whatnot Maybe you could describe a little bit about you know, what luster does to support Uh the various mpi's out there and how to extract good performance and you know, what what metrics are you trying to Optimize on for mpi Kinds of workloads given that there's you know oodles of different kinds of mpi workloads But at least there's some commonality in saying an mpi workload versus other kinds of uh parallel workloads Yeah, so um There's really in in the hpc world There's generally, you know two kinds of i o workloads One is called file per process. So if you have you know 100 nodes each You know running a hundred tasks you'd get a hundred output files And um the second kind of of i o workload Is called shared single file. So even though you have a hundred tasks Running in your mpi job. They're each writing to some pre-allocated uh part of A single file usually the the rank number times some offset And so luster can handle both of those workloads fairly well As you would imagine running with file per process You you get the best performance because there's there's no contention between each of those files um Luster handles all of the locking and everything separately for each file versus shared file i o you can um You have to lock The the file and the data so that it's coherent between all of the clients and so that adds some overhead What's interesting in in newer in newer clusters as you scale up the number of cores You know into the hundreds of thousands On you know the biggest systems like jaguar at oakridge They have you know 256 000 cores or something like that It starts to get To the point where you don't want to create one file per process They they have started shifting all their applications over to using a single shared file And um, you just don't want you know 250 000 output files for every you know one hour of computation or whatever it is you do So they've started shifting the the i o over to shared file but They they limit the number of of nodes that are submitting the i o To avoid as much lock contention as they can and How that ties into mpi is that um Luster has uh An mpi i o driver um for the 80 i o interface M pitch i believe m pitch 2 it was submitted upstream and so it can you know efficiently um, you know set the striping and you know the number of Of ost's that are involved for writing a single file Based on hints and the rank count and things like that Supplied by the mpi application And while it's you know, it's possible to do all of this from you know calling an i octal or An api in luster directly from your application Doing it inside the mpi i o layer is you know a lot easier for for most application writers Question for you you um one one clarification question on your answer there You said that uh down at jaguar one of the things that they do is they limit the number of mpi processes that are Outputting to the shared file Just curious does the mpi library hide that from? The application so you know from the application's perspective are they all writing but mpi is doing some coalescing down to a Smaller number of nodes underneath before doing the file writes or is that something they actually adapted their applications to do? Um, I think it's a it's a bit of both There's there's a facility in the mpi i o layer called collective i o and For smaller i o submissions Uh, I I believe this is a mode that you have to activate, but I'm not totally sure But collective i o will aggregate um You know small writes from many clients onto You know the i o nodes essentially and Merge them, you know based on file offset and then submit large I o requests to luster and so that's that's done inside the luster mpi driver But I think once you get beyond a certain threshold um the amount of communication that you need The network level to aggregate the i o Is is not efficient and so we don't do that ourselves It's up to the application to decide You know which nodes need to to do the aggregate, but that's not my area of expertise So you mentioned a stripe Actually when administering a luster system, it seems like I'm worried about stripes a lot. Could you mention? Expand on what a stripe is So in luster, um because they're separate servers that are That are providing the storage for individual files One of the the the decisions that's made when a file is first allocated Is how many servers? Will this file reside on? And so that's that's a fairly, you know static decision today in that When you first create the file you You know if you don't specify anything it picks, uh You know a global file system default number of stripes Or if you you've set Striping um on a directory It will determine How many osts this file is spread over? and The number of stripes that you use for your files really depends quite a bit on How your your application is running? So as you can imagine, um If your File is striped over a lot of osts Um You can you can get quite high bandwidth But it's not always the best decision. Um, you know if you have A thousand clients And they're each You know writing individual files Then spreading those files over You know all of your osts just increases contention because now each ost has to handle You know a small chunk from each of these thousand clients and you'd be far better off only having one stripe So that a client Is only writing to one ost, you know, it keeps the rpcs Smaller it reduces the amount of of locking that's needed to be handled for each um for each file and so it's It's not always a totally obvious decision with luster on how many files or how many objects a file should be striped over Um work that we're that we're doing now Um for the hsm project is actually going to also allow us to You know potentially Restriped files after they've been created But it's not something that's supported today Now what about in a case like the cluster i operate? We don't run large Few very wide jobs we run many medium to small size jobs. So we probably have Oh a thousand running jobs and we only have 32 osts and our default stripe size is one Does that actually help out because you mentioned the client speak directly with the os on the os s so Are all the ios coming from these different independent jobs doing their different thing Kind of being isolated from each other because they're going to be load balanced across all the different osts Yeah, I mean to a certain extent They are they are load balanced The the selection of which osts a file is is placed on Is generally done By the the mds Um, you know unless you you override it specifically but The the mds does some job to to do load balancing Based on how responsive the osts are It if if the osts are also have space imbalances You know beyond 10 percent the amount of free space between the osts Is imbalanced it will also You know preferentially place new files on the less full osts And so if you have lots of independent jobs like you say You know it it tries to to spread the work as evenly as possible over all of the osts Let me ask you a question in a different direction here now I'm an open source developer myself and so it's always interesting to me To hear how other open source projects are developed. Um, so what what is your model? I mean I I Kind of assume you have to forgive me. I don't know at all, but I kind of assume that most of the development happens there at sun and uh The open source aspect is mainly for some collaboration with a little larger labs But you're not really actively seeking a development community Too large outside of sun. That's a pure guess on my part. I wonder could you just describe You know, what is your development model and how do you interact with the open source world and things like that? Yeah, so, um Luster is is gpl and It uh, you know is it's you're right in that, you know, the primary development does happen inside sun um, we have had some larger contributions from Um outside parties in particular, uh, cea, which is the french atomic energy commission They're working. They finished one Project called ost pools And they're working on a second one, which is hsm hierarchical storage management for luster Other, you know groups lords livermore in particular have submitted, you know, smaller patches and things like that But um, you know Contrary to your comment that we're not really seeking a developer community I think the unfortunate thing is that luster is complicated enough That it's sort of self selecting Right, um, you know, I would be happy to get lots and lots of patches from the outside world The truth is is that It's just complicated and people Generally don't um You know dig in and fix the problems Right and so in terms of you know, how does our company, um, you know, even before we were acquired by sun, but uh, you know We provide, um, you know contract support and For a long time also We also get development contracts from our our larger customers, you know to implement features That they're specifically interested in But I think it's it's just You know so complicated that That not many people Can understand it and sometimes we have trouble ourselves I I could completely understand. I mean open mpi. I think we're probably not as complex as as luster, but uh You're right that there's a small number of people out there who are willing to dig into the code and say Aha, here is the exact place where you're having the problem or Feature uh for for something. It's just it's just big and complicated. It's the way it goes Uh, let me ask you a mundane question. Uh, simply because this is just I asked this to everybody. What version control system do you use? Uh, currently we use cvs, um with a lot of scripts around it to manage branches I think you're the first Yeah, so we've wanted to move off of cvs for probably a few years now Um, and we actually have a project that's Has somebody working on it full-time to move us over to git um So yeah, we've we've known about the limitations of cvs for quite a while, but uh It's it's just sheer inertia that's kept us there until You know, we move over to git. Hopefully that'll happen. Absolutely. I can understand that. Yeah Okay, so How about a little bit of what's slated for the future of luster? There's been a lot of changes going on Um, what should we expect to see in the next couple releases? So the the the main development focus right now is on the uh, the 2.0 release of luster I mean, if I do a development is maybe a wrong word um There the development is largely frozen and they're mostly bug fixing on the 2.0 release of luster and that one's primarily um focused on a rewrite of the metadata server in order to make it um more scalable and um Also to allow us to uh have more flexibility for future additions in the future um There's a number of of changes that will allow us to uh Add support for a feature called clustering metadata Which allows us to you know scale the number of mbt's in a single file system so that we can scale up metadata performance um, you know Linearly by adding more metadata targets into a single file system um, we also Will be able to take the the the metadata server and um The object storage server and move them over to zfs based file systems And while none of that is actually in the 2.0 code yet um a large part of the code rewrite Was was done to facilitate those kind of 2.0 itself has um a limited number of immediate features One of them is is uh change logs Which is essentially like a you know like an rss feed for changes that are going on in the file system And so then you can hook in you know a backup tool or rsync tool Or um, you know in the future for for hsm It will be able to hook into this change log and uh Avoid scanning the file system to look for changes and you know if you're looking at the scales of file systems that we're working on um You know one of our Our targets, you know from a customer Actually a whole series of customers for 2012 is a file system with a trillion files in it and um We're not going to be able to scan that every day for you know what file to backup right so um Has changed logs Going a little farther out The release beyond 2.0 is going to have The zfs support and Some previews of of other features Kerberos authentication and Encryption over the network The cluster metadata preview so these will not be production supported Features but um, they'll be largely functional and available for people to start You know kicking the tires and things like that So you mentioned earlier that um You know luster takes off where you know a single nfs or a big honkin file server of of some flavor Uh leaves off right where you want the really big file systems and you're going traditionally for the hpc kind of space Do you ever see Luster moving into other markets. Do you ever see it becoming a bit more commoditized? Um for Um, I don't know other kinds of data center applications that are not necessarily hpc, but also not necessarily ginormous, they're just you know big Um, or you know, do you ever see it becoming more mainstream? I guess is what i'm asking Yeah, I mean there's there's um you know increasing focus on Making luster easier to use and configure And I think that's a prerequisite for any kind of you know general um You know adoption in a more mainstream environment. I mean You know the the amusing thing is that you know in the past we have had rocket scientists You know managing the file system and um So it's it's you know if if you started off with luster You know a few years ago. It's definitely gotten better since then um You know there there are some applications where You know luster does well even Even when you don't um You know have a gigantic file system. There was one um partner that was doing um You know chip layout And uh because luster is you know fully cache coherent among the clients um, they can run you know multi-threaded jobs that you know are are Scaling and caching a lot of data themselves Without um having problems with nfs And so there's a few applications like that that you know do benefit from having luster at uh a lower scale but um You know I I don't really see it Well, I mean at least for myself. I don't see it making sense Once you get below, you know for osts, let's say Because you can you know luster is just Exporting the local disk in some sense And so things like nfs Can do reasonably well um You know doesn't make sense um To have a single server that runs everything and it's actually a common question on The you know luster mailing list or has been in the past You know somebody who has You know two SATA disks and they say oh, I want to aggregate that storage And you know do failover and things like that But it's not really the target audience that we have Yeah, okay. Well, this is this has been a great time. Thank you so much for getting on the show um, you guys were the most requested thing out there, but um, if anyone listening to this wants to request another topic There's a nomination form on our website at www.rce-cast.com And there you can subscribe To the itunes feed or the rss feed to get these shows automatically and we kick them out every two weeks So thanks. Thanks a lot again I appreciate your time No problem at all Thanks