 I'm going to talk more about our functionality because I don't think that people really know our project. And if somebody wants to go later into how things are daft or deeper development stuff, we have a main developer here who can answer questions like that. I'm just like from the usage and administration side of things. So we have a distributed parallel file system that can do geo-replications with its distribution. We have concentrated on simplicity like that. The basic idea is if you have tons of data and just need to set up something fast so that you can put it there, we think we have the right solution for that because it's like eight lines of text, eight to ten lines of text to set the whole thing up. So I just used the example with the t-shirt. I mean, you don't go to a rocket scientist to design you a new place for your data if you just want to store things that you just grabbed from the latest foster. Yeah, we are basically the system has two main components. You have a metadata server and as many chunk servers as you like. I think the limit currently is 1024 chunk servers. You can store up to one exabyte of data and the setup is very, very simple. And I will show that later in the demo. I did that yesterday. I had some problems because the collectors were not working, but I can show you that. So there is basically no real learning curve. There is one command to manage the system. There are about two config files for a master and three for a chunk server that mostly you don't need to touch. The default configuration will do nearly everything for you. The only thing you have to do is to tell the chunk servers where the master is and where you want them to store your data. So master chunk servers, we have options for HA on your masters. When one of the masters fail, then the one that has the latest version of the data just takes over. The switchover is very, very fast. I think we had in our latest test, it's up one second that it takes to switch over from one master server to a failover one. You can mix all kinds of platforms. We have our software available for everything that Unix likes. I have a setup where I have chunk servers on BSD, on CentOS, on Debian Mixed. I even have Solaris chunk servers and they all work together. So the system is basically distributed over all of them. It doesn't really matter. The chunks are saved in a normal file system. So the performance of your chunk servers mostly depends on how you set up the system under it. We mostly use XFS or ZFS for our scenarios, but it basically depends on what you have and what you like. There is also an option to use a tape as a chunk server so that you could basically have replicas of your chunks automatically saved on tech drives. As I will show you later in the demo, you can change your storage policies on the fly. So basically if you decide that you want to have three copies of a file in your storage now and later decide that those files are not important anymore and you just need one copy, it's one liner, and the chunk servers will basically reconfigure that by themselves. You can group your chunk servers. So basically each chunk server can have a label. And depending how you set your policies, you can define that certain data is only saved to certain groups of chunk servers. So let's say you have two chunk servers with SSDs and three chunk servers with hard drives. You just give the chunk servers with the hard drives one label and the ones with the SSD another label. And by defining your policy, you can just with one line tell them where the data is supposed to be. Besides replication, we also support erasure coding. The advantage of erasure coding besides that you have more space is that the system then starts striping. So basically as you set your erasure codes, your system gets parallel rights. So when you use replication, your clients will write to the first chunk server and that chunk server will do the replication. When you use erasure coding, the way you set your erasure codes, let's say three data parts and two parity parts, it will write to five chunk servers in parallel. I think the smallest will be about 128k, 64k. Under 64k, we will not cut the final pieces. Since erasure coding the system is parallelized, you will basically get nearly the same performance from few fast systems like from a lot of slow systems. That way it's relatively easy to have an upgrade path where you start with the systems that you have and you can at any time replace them on the fly. With the labeling that I talked about, you can create some kind of basic tiering because you can set speed labels and define where your data goes to and that's not a replacement for auto tiering yet, but it's like a start. You can have as many labels as you want and since the movement is very fast, it's relatively easy to create a manual tiering on your data path. We have a system that we developed for the master server, high availability, which is based on the REFT protocol and is using Quorum. The reason why we developed that is that we had quite some problems with using what was available there. There were a lot of split brained scenarios and in general the switchover was too slow. With our new HA for the master servers, we have in my experience under one second switchover time. If you configure your policy to be replication, your client will basically start sending data to the first chunk server and depending on how many replicas you set, it will create a chain reaction. When the first slice of data goes to the first chunk server, it will start replicating to the next and so on. Until all the data is copied and all the replicas are done. That way your client basically is done as soon as he has the data copied to the first chunk server and doesn't have to think about all the replicas. We have metadata only snapshots, so basically it's a copy on first write system. Like in a ZFS system, you would basically have no space occupied by your snapshots and we have no limits on the amount of snapshots right now. When you add disks or you add new chunk servers, the chunk servers will try to balance out the space during write operations. The auto balancing? No, basically if you say set that you want to have three copies of every chunk, it will still keep three copies on three different chunk servers. The point is it will try to have the same amount of data on the amount of drives and chunk servers that you have. Unless you use labels, it does not stick to certain chunk servers. So you can set a policy of three, have 20 chunk servers. It will always keep three copies but it will not tell you where. It will distribute and auto balance the data so that all your chunk servers are utilized in the same way. It depends if it's a replication or if it's a razor coding. If it's replication, I think it's round robin? No? If you have it undefined, it will try to balance out by space and business of a certain chunk server. If you have labels, then it will go there where you defined where it should write to. You don't speak directly to the chunk server. You speak to the metadata server first and that will tell you where to find your chunk. Yes, I can show that. It's pretty easy. In the chunk server, you have a definition file where you specify the directories where it's supposed to save its data. If you put a star in front of that entry and reload the chunk server, it will empty that directory. So just wait until all the chunks are moved to a different safe place and when it's moved, you just switch it off. So drinking is very simple. Yes, unless you have a goal of one, then you're out of luck. So if you don't have replica sets, then as long as you have set your goal so that you have enough replicas, yes. It will automatically create new chunks to the next available space if you have that. The only scenario that I have experienced in the last months where we had a problem that actually balanced itself out after a while, funny enough, was somebody took live disks out of one chunk server and shoveled them into another chunk server. That was tricky, but it still managed to delete old versions, start, replicate stuff around and it balanced itself out. I was pretty surprised that it managed to do that, but it did. So it's pretty robust. Anybody has any other questions? So yeah, we run, like I said, on nearly everything. I personally have tried PSD, CentOS, Debian, Solaris, what else? Oh, I have a server and a client on the Mac. There is a commercial Windows client for people that insist on that operating system. I don't think that there's any platform that we couldn't support. I know that there are people running it on ARM. There is a relatively small community for that. There's one university in Germany that wants to set up a rather large ARM scenario. I'm pretty interested how that will work with the new 64-bit ARM. So if you have any other adventurous Unix-like operating system you would require, does that, as long as we can port that probably as well. The freeBSD port took about three hours. So let me just show you. I'll just connect the demo computer. I basically created a very small setup to show how a LizardFS just gets set up in a minute. Yeah, I just have to wait for the network to wake up. Does that fit? Yeah, so I basically set up a couple of containers in Proc Smoke. Just to show a fast setup. So all I basically do is... Can you invert the terminal code? I try. Just have to get that back. I don't remember how you invert it, but I can at least make it... That won't help. Yeah, but no, I don't know where the terminal is. Is that readable? Yeah. Cool. Yeah, so basically we hardcoded some stuff. If you don't set a name for the master server and the configuration files, it will assume that it's called MFS master. So all you have to do if you want to do a fast setup is put MFS master in your host file and point it to the IP of your metadata server. So basically the MFS master entry sets it up. Yeah, there's one thing I didn't talk about. You can define a topology of your service scenario. You can set which chunk servers should be preferred by which network segment. So that way if you have a large complex network infrastructure, you can make certain clients just talk preferably to certain chunk servers. We have a customer that has a large rendering farm with a pretty chaotic network structure. And since they are trying to keep copies of all the data on all the chunk servers so they are close to the rendering nodes, they use topology to basically have one client always talk to one chunk server. So what I basically do is I just copy the sample files from the packages and I don't change anything. I have to create an empty metadata file and can just start the metadata server. Let's check this one again. And we have a very simple web interface that basically lets you monitor all the stuff. On that resolution it's a bit limited but you can see you have started a metadata server. You can see how much RAM it has, how much RAM it occupies. All the metadata is kept in RAM while it's working. So the amount of files you can store in a LizardFS system is basically limited by the amount of RAM you put into your metadata. So we have a metadata server basically running. I will attach two chunk servers to it, which is also quite simple. So it's just one package that's called LizardFS chunk server. I have created a directory for the chunk, it's called LizardFS1. When I have two files to configure for that, one is basically telling him where he can find it. And the other one is a configuration file where I can keep the default values because I put MFS master into the host file. And you can see I have a chunk server running with four gigabytes of available space. So adding another chunk server is just doing exactly the same. And you can see in this example configuration file for the hard disk how you can remove a hard disk. You just put a star in front and it will just redistribute all the chunks to different storage spaces. We have our own protocol. It's a few-spaced client. We are working on some other clients currently. One is a native client for QMU that should be released pretty soon. Another one is an emulation interface for HTFS so that you can connect Hadoop-like systems to it. And there is a plan to release an NFS conversion so that you could locally connect via NFS and it would speak Lizard on the network side. To support multi-tenancy. Multi-tenancy in which way? Having different users that can mount a shared storage with Quotas and different users not seeing each other's files. So we have a full POSIX implementation. That's not the same thing. And you can define writes for hosts per directory and for sub-directory. It's basically similar how you would do an NFS setup. You define on the master which part of your namespace you want to be available to which client. So the rest of the namespace would not be visible by the client. If it won't be totally visible you can inhibit reads but I don't know if it will not show you that there is a directory available. Which database are we using for metadata? It's in memory data structure created by C++. What's the performance like? If I have a flexible client service with a 10 gigabit SFB is that something I can saturate? It depends how many streams you have. Single-speed currently is about 600 megabytes. The QM driver seems to be a bit faster. It has about 750 megabytes in our last test. But that's per single stream. So for example if you would do KVM hosting we usually do one mount per virtual machine and each mount would have 750 megabytes. So yes, you could saturate it but not with a single stream. That's a good choice for metadata that you workloads like medias. Like? Medias. Like media. Medias in main storage. If it's main storage with single small files, not yet. It's like we said before, if it's under 64K the file will not be split so it will not really have distribution and the performance would go down. What snapshots are these on the file level? Any object. Plus you can... One thing... I don't have the client now here but you can change the policy with one command on any object. So you can set a goal of two now for your directory and change it to a goal of three just saying set goal three and it will immediately start creating the next replica. So you can move from EC2 plus one to replication of five just with a one liner. Which is quite interesting if you wanted to backup. If you have chunk servers that are defined as take drives you could just do a policy change on the directory and you would have automatic backups to LTO tapes and you get it back by changing the policy on that directory with another line. Do you have an idea of the ratios of H2N1? Yes. Amount of files per RAM. I have a table but I don't remember it. One second I'll get you the table. I can give an example. We're running it at a school level four and we have about 15 million bottles Let me go here. So you have a... There's a client. The client has a config file where you basically define your mount point if you want you can also do it on the command line but I can't type today anyway. And basically all you have to do is give the directory access. Again it has to know where the MFS master is. If you use a different name for your metadata server you have to set it in the configuration file if not it works right out of the box. And there's your directory. It's mounted. And actually say we... It has a goal of one so there's only one replica. It doesn't really make sense to set a different goal. Now I just have one tank server, right? Setting a different goal would just be two, three, five, ten to that password and it would distribute as many chunks. If you're setting one chunk server would you have less than copies of the solution or would you fail? It's really great. It would... And it would wait for you to add more chunk servers. So once you add a new chunk server it would automatically start creating new chunks back. Yes. It creates... It dumps the memory to disk every... How often do we dump the metadata to disk? So we create one dump per hour and you can set it up in the configuration to do it more often. Plus you can create so-called metaloggers that will constantly log all changes to your metadata server. Yes. Integration tests. Yes. We mostly do load tests with any tool that I can find. I mean, it's like... The test for 3.10.6 was like... I had about 12 chunk servers and I was bombarding it from multiple streams with Iometer, IOzone and FIO at the same time and then started kicking chunk servers out. I think for redundancy testing that's good enough. Constant 100% load is always a nice test. There's nothing that you can do more than that, actually. What methods can you use to make metadata server highly available? So... We have members of the community that basically have tried it with... We have our own system, but that's a commercial solution that we invented that's based on the RAFT algorithm, but it's your choice. Yes. So the model point on the chunk server can be any file system? Yes, it's just a directory. Yes, mixing different files. No. Performance-wise. Okay, you have a performance impact on what you choose, but it will always be... In the replication, I don't see... I don't see really a big impact if you use replication, but there will be a very big impact if you use erasure coding because your writes are parallelized and you will always have the performance of the slowest chunk server in the write, because the write will only be finished if all the chunk servers in that stripe set are done. Yes. I actually have the same experience as the SAF guys that everything that starts with EX is not really a good choice. We have quite nicely worked with XFS, and for very high performance, we have ZFS setups with all tuning that you can do there. With ZFS, your only limit is that you shouldn't fill your file system more than 80%, because then your performance goes like... I can answer any question, but after that, because time's up. Thank you.