 So we should start that way. You can go to lunch earlier. Thank you for coming. If you're here, it means you probably care more about better FS than the kernel. Otherwise, you'll be talking. Listen to John's talk about all of the kernel. I'm not John, but I'll try to do as good as he does. Feel free to interrupt me for any questions during the talk, especially if it's something about the slides. Otherwise, we can do questions at the end, and you can grab me also at the end of the talk. So let's start. So why am I here and talking to you about better FS, since I didn't actually write a single line of code in a kernel on better FS? I have some user space stuff that I'll talk about later. Been using it for a long time, a bunch of five systems back when we had even EXT2. But interestingly enough, I worked at network appliance in a long time ago now in IU7. And even back then, they had five system snapshots. And it's a little bit like crack. Once you had it, you just don't want to give it back. It's just you have history. You know what, if I was an hour ago yesterday, you can go back and get it. It's just freaking awesome. I wanted that on Linux, and Linux had LVM. So, oh, great. I'll just snapshot my blog device. Except, oh, my God, LVM. So slow. If you do snapshots, it just multiplies writes. I got down to about two megabytes per second on my rate five server at home. It was unusable. I mean, I needed it, but anyway, I wanted something better. And I've been wanting something better for a long time. So eventually, and we had a few things that came in the middle, but nothing really worked well until Better FS for me. And we'll talk about ZFS also. So that was one thing. And then the other thing I like about LVM is having partitions without having to actually repartition a disk, which is a pain in the butt, as most people know. So LVM did a good job with that. But again, the performance was an issue. So I switched to Better FS about three years ago after going to an LCA talk. Strangely enough, about Better FS. And back then, it was still a bit rough around the edges. It's definitely better now. So I won't have to be lying to you as much as the nice person who told me that Better FS was great three years ago. He did it, I'm just joking. So Better FS, why should you consider Better FS? I mean, really a big feature, which is not new, right? But it's something you can have today in Linux, which wasn't really available before, is copy and write on the fire system level. And I'll give you more details as to why it actually matters. Snapshots, as I mentioned, one thing you can do between subvolumes, you can copy a fire without actually copying the blocks. So you just say, hey, here's a new iNode. They're different iNodes, different owners, different permissions. But they write, they point to the same blocks until you start modifying one block and then just that one block gets modified, which is a much better thing than just hard links. Metadata is redundant in checksums, which is not quite true in EXT4. They have patches for that. And if you use things like Docker, Better FS is really a nice underlying fire system for Docker, and they explain why. Also, Susan now offers a very nice option where you do an upgrade, and it shows you, hey, and you don't upgrade just doing anything you want, like command line, RPM, whatever, it snapshots before and after and tells you those are the files that changed without you even knowing why. And then you can go and roll back to the previous system if your system is not very happy anymore. Other things are built-in rate. I'll give you a bit more details. So to be fair, built-in rate is not totally finished. It kind of works by this. A few things are not fully on par with the DM rate in the kernel, but it's getting there and having it in the fire system removes one layer in the middle and makes it more efficient, a lot faster. It has fire compression, which is always very nice for some kinds of data. As I mentioned, partitions, I'll give you details. You create one big pool, and then you can create sub volumes that actually act as partitions. You can do online background scrub, which is not a full FSCK, but it will find some issues without you having to amount your fire system. The big thing, which I'll give you details and that part of the reason why I gave the talk about fire system pushes via diffs is the option of doing better FS send and receive, which basically lets you snapshot the fire system, show a diff between the previous snapshot and the new one and just send that diff to the other side and then have the new fire system on the other side be in sync with the server by just sending over the blocks as opposed to doing a big copy like our sync with. And a little, I mean, I find it kind of cool, although, you know, it's better to start clean, but better FS convert can actually convert an EXT-3 file system to better FS without reformatting. The idea is scary, but it does that. So before anyone asked, hey, well, do we have the ZFS? And yeah, we do actually. ZFS is actually still more mature than better FS today. So if you didn't care about licensing, if you are on the Sunaway Solaris, ZFS is a perfectly, actually a really good fire system. It has a lot of people who work behind it and it still definitely has more features. It's more stable, but it wasn't really designed for Linux. So there's people who have put it to Linux. It's called Open ZFS. It's a good job trying to make it work as well as possible. It's not really designed to work with the memory subsystem. So it does use more memory, but it's only around it. Some people don't care. You just pay for that. The bigger problem is licensing. So I'm not an open source lawyer, thankfully, for me. But basically, Sun obviously had that code. Oracle bought Sun. So Oracle effectively owns the patents and the licensing to their original code. You may have heard that ZFS is not compatible with the Linux kernel. As in, you can put it together yourself, but you cannot distribute it to anyone, which is kind of a problem. Now, Oracle happens to be the company that put a lot of time and effort behind better FS. So I asked myself, well, why would they still be writing better FS when they have the ZFS code, which is perfectly good? I mean, not perfect, but it's much further along. Never really got a good answer for that. So the thing, I mean, they have a page that says, well, we would have to go talk to all the people who gave us patches and ask them if we would be able to get their license and that's a lot of work. I'm not quite buying that. It's been done before. So I looked into it a bit more and when I asked them, they said, well, you know, we rather focus on new files as them, but that's the future for us. So they didn't tell me, you know, ZFS is dead as far as we're concerned, but I think that's what I heard. And by the way, you're welcome to, all the quotes here and especially all the links that you see, you can get the slides and those links are clickable so you can get the longer quotes and they'll be giving scripts later. So you don't have to write everything down. So the other, another person working on that said it's a non-appliable approach, which is not native. So as I said, it doesn't integrate as well for memory, but I still think, you know, it would be a lot easier to fix that FS to be a first tier five system for Linux if it weren't a licensing issue. Then again, patents. Well, so NetApp actually sued Sun on ZFS because it infringed Waffle patents, which is their Waffle being the NetApp five system. Now it turns out actually Sun attacked NetApp first. So effectively it's just not another patent war. A lot of good lawyer and people time ways to deal with this. And well, they obviously had lawyers they ended up settling out of court. So we don't know exactly what happened and who said what. But my take is that they had a, they may have made an agreement that, you know, maybe Sun or Oracle was not gonna be further working on ZFS. That's just my guess. It could be something totally different. But I think, you know, Oracle is a bunch of smart people. They have that really good code base and they're not using it. So there must be a reason. But whatever the reason is we're here today, ZFS is not gonna become GPO tomorrow. It will never be in an X kernel. You will never be able to make a product and distribute ZFS as a product with kernel. So, you know, if you wanna use it at home for your home server, by all means do so. You can totally do that. If you're using your company for your personal servers in the company, that's fine too, until you're trying to shape a product with that. And I've seen many places where you do something internally and then one day someone says, hey, we should sell that. And if your product depends on ZFS only then you're in trouble. As an example, like even at Google, you know, Google search was supposed to be an internal product and there's a Google search appliance that is being sold. So if it required a ZFS, we could not ship that without being sued. So that's for ZFS. It's not a ZFS talk, but I had to explain why, you know, why even bother with ButterFS. So the next thing is well, okay, ButterFS you keep hearing how it's not really stable yet or it's not really, it's still experimental. Technically it is experimental in the kernel, but it is pretty stable. It was stable enough when I started using it three years ago and it's definitely much better now. But you know, you have to be careful with how to use it. You definitely have to be careful with backups and so forth. So I will give you details on that. I give some kernel versions here. You definitely want newer kernels. You don't wanna be using something very old because there are many, many bugs that are being fixed all the time. So currently I would recommend anything better than 3.16.2. Let's see. So one thing that could be an issue sometimes it needs to be manually rebalanced. So you have space, but it tells you your file system is full. It's not fatal, but it means you have to do something yourself to kind of unwedge the file system, which is a little bit annoying. And it's doing that better by itself now, but it's still not fully automatic. Defragmentation is also, or the defragmentation still has a few issues also. Send receive is definitely usable and perfectly good as of 3.14, which is now reasonably, I don't wanna say ancient, but it's not a gift to this kernel. Rate five, six, I'll give you more details. It's usable, but I wouldn't put anything production on that. So what's missing? Everyone say, oh, but if it's no FSCK, you can use that. Well, okay. There is an FSCK. It mostly works for almost everything. So there is FSCK. The other thing is that actually, most of the time, you do not need to use FSCK with better FS, just because of how it works, right? It has consistent points that it writes. And if you pull the drive power or anything, it can go back to the last consistent point. So you don't have to go, scan your entire file system and find blocks and decide which I know are free or leftover or anything like that. So unless you hit a bug or something bad happened, like your hardware not writing things like it was supposed to, which does happen, but in normal cases, you don't actually need to run FSCK. Encryption, that is not in a better FS yet. You can do it underneath and I'll explain that. Block dedupping is where you have blocks that you wrote independently and you would put them back together so that you can share that data. There's an experimental code for that, but it's not quite mainline and automatic yet. Firm enough companies where I work in the better FS, you can read the list, I'm not gonna read it for you, but it's not just like one or two people are doing this. Who's using it in production? A little while back, I tried to get a list of companies who admitted to using better FS. The page may not be up to date, but that gives you an idea of a few companies at least and there are some big names in there. There's also companies like Fujitsu that are actually selling products based on better FS, so Fujitsu is not exactly a small storage company, but there are people who definitely care and putting effort behind it, which is always a good sign. And you can read the rest of the better FS and try the LWN codes. So, you're here because I can save, hopefully save you time on reading all the docs and figuring out what a better FS can do and how you should use it, because it definitely doesn't quite work like our five system you're used to, like EXT4. So, we're looking to all those points here and I'll go through them one by one. So recovery, I hate to start with when things go wrong, but really as an assistant man, when I hear about data storage, I wanna know what I'm gonna do when something happens and it doesn't matter what you tell me whether it's stable or not, I expect that things will go wrong. So, for the recovery, there's a pretty good wiki and by the way, better FS has my pages but the better FS wiki is definitely a place to go for more cookbooks and details on features and what to do when things go wrong. So, the first thing, and by the way, I wrote some pieces of it because I thought it might as well be in a place and everyone can read. So, scrub, as I mentioned, is something you can run maybe every night or once a week and I wrote a little script that lets you run scrub and find errors and just email them to you. So, that way you don't have to find out later that you have data that's not quite the way it's supposed to be. Then, you can amount in read-only and recovery mode which then tells better FS to not trust all its data blocks and still let you mount the file system that doesn't look quite right but it could enough for you to get data off it if something bad happens. As I mentioned, it's a logging file system. So, if somehow it gets wedged between two states where it mounts a new state but it finds stuff that's not quite right, it will not automatically throw away the new state because maybe there's something there you might want. So, zero log lets you zero the last bit that were written and go back to the previous state and then you get back to a good file system before random crab got written. And, better FS restore is a pretty nice tool that basically lets you mount a better FS disk image and will scan for files inside. So, you don't have to do it yourself with some complicated tools and it will just, even if all the structures are missing or things went really bad, it will try to find data off it and save it in different place. And check repair, which is also the better FSCK program. That one is the FSCK that you're used to. It is not nearly as good as the one for EXT4. So, it will work but if you have something that you really, really, really care about, you might want to get it through the other means first and then you can try to fix your FSCK file system and if you're happy enough with the output of that then you can continue using the file system but if the repair does a lot of things that look sketchy to you and you haven't gotten all your data yet, you can get your data off it and then just remake the file system. Again, this is not something you do daily but I'm just telling you what you would do if bad things happen. And for me, I got a fair amount of experience with that because I had multiple bad SSDs which would not do what they were supposed to. It's not, I mean they would also die and corrupt data on top of that but if you be in power, they would actually not have written all the data that they already had said they wrote which then made better FSCK unhappy. And those were not better FSCK bugs but when you mount next time it says, ugh, my blocks are wrong. So, the next thing is I always like to plan ahead because by the time you are traveling with your laptop and you're giving a better FSCK and you don't have internet because you're on a boat which has never happened to me of course otherwise it would not have written this life, you wanna be ready for that beforehand. So, on the laptop for instance, if you only have one drive and your root file system is on it and then it doesn't mount, well, recovery from there is kind of inconvenient to say the least. So there's a few things you can do. Of course you can give the recovery option I just mentioned that might just be enough to get you out of trouble. The next one is most people you use in it already nowadays and make sure you have the better FS tools inside your in it already. That way you can fix the file system from there. Otherwise you cannot mount your root file system that your tools are on and then you're in trouble. I'm personally a little bit extreme. I actually have two drives in my laptop. I have a one terabyte SSD and one terabyte hard drive and I make sure both are bootable and I copy one to the other and better FS makes that very easy. I'll give you more details on that later. The little bit I gave at the bottom is to make sure that the better FS tools do to get included in your in it already when you build that. So by the time you need it, it will be too late to edit. The scrub, I did mention scrub. I mean, if you're interested in that you can read the slide in more detail in the script I gave at the bottom but effectively it will check all the metadata blocks, make sure the checksums are okay so you don't have random corruption. If you're running Rade, it will make sure that both sides of the Rade are agree on what your data should look like. That's a longish slide to give you an issue with, to fix an issue with scrub where it would actually tell you it's running when it's not because it got interrupted, maybe a reboot or a crash. It's a simple thing to fix. So if you ever get there, you can just go back to the slide. All right, so I mentioned no encryption. I personally encrypt everything just because because I don't want to have to think that I should be encrypted. They are just encrypted and that's where I'm done. Now I mentioned there's no encryption in the Better FS. So you would use the encrypt as one option. Of course you can also put encrypt on top of your file system but those options are usually not as fast and they're not the ones I prefer. But so on the encrypt side, if you have the encrypt Better FS and then you're running Rade on top, maybe a Rade 1 or Rade 5, for Rade 5 I would still definitely recommend you use the block level Rade 5 today since the Better FS 1 does not complete. So you have two ways of doing this. You either run the encrypt on top of your drive and a Rade 5 on top of the encrypt or you do it the other way around. And basically what I'm explaining here is that in my opinion, you better off doing a Rade 5 first and then put the encrypt on top. And the main reason is you're doing Rade 5 resyncs. You don't have to be encrypting and re-crypting on the block layer side. Basically it's copying blocks. They are encrypted but it doesn't care. It doesn't have to do encryption, decryption in that process. And then you put the encrypt on top which makes it easier because you only have to decrypt one block device at boot. However, you can do it the other way around and if you wanna use Better FS Rade 5 built in, then you do have to encrypt every single device. There's a script that I gave at the bottom of this slide but effectively what you do is you decrypt all the drives. They show up in DevMapper and Better FS scan all devices available. Then you tell it to mount one of your X devices and because Better FS knows what all the other ones are, it will from one figure out what 5 system you're mounting and it will find the other ones for you. So you only have to give the name of one device and then the rest just works out by itself. You can also mount by label, of course, like many 5 systems. So partitions, you don't need to do partitions anymore. The only reason why you would make a partition if you have a single drive is you would have a second root 5 system on that second partition. So if your first partition with everything else gets corrupted in a way that it will not mount, then you can have a backup root partition to boot from. If you have two devices, then you just boot from the second device. So you just create a storage pool which is basically all your Better FS blocks and that pool then you create subvolume volumes which are really just fancy directories. And each subvolume can then be mounted as if it were a mount point. So I'm giving the commands here, you just create them with subvolume create and then there's multiple ways of mounting them. The first one you can give the root, the subvol equals root which is basically the name of the sub directory. The other option is you would mount, that's the last line on the slide, you would mount the pool of all the sub volumes and you can do a bind mount of that directory to the destination directory where you want to mount that 5 system. And sub volumes, you can think of them, they're almost different file systems even though they're contained within the big Better FS file system. Snapshots, so the reason why even if you have one partition for everything, you still want to create one subvolume in which you put everything as opposed to just putting it at the root of a Better FS file system. And once you have subvolumes, you can snapshot your subvolumes and that's how you would do backups, that's how you would replicate to a different place. And that command just shows you how it works. It's very, very simple, right? You just make a subvolume test, you create a file in it, then you snapshot to your second file system. You can create a file in a second, delete the first one and it's there in the snapshot but not in the original file system, right? Not a rocket science but just to show you what the commands look like. The subvolume snapshots, if you look at them with subvolume show, it will show you that the file system that was snapshot it keeps track of all the snapshots of itself so you can actually see that it was snapshot at once. And then of course on the snapshot itself, it knows what it points to, which is obvious since it needs to know where it came from. So yeah, backups. So I tend to make the point that snapshots are super nice when they're not backups and because I like to make points, I'll make my point really well by having two different slides to say this exact same thing. Snapshots are really good for, oh, I just overwrote my file and I really wish I had the copy I had an hour ago. Then you can go back in time and you can go grab that copy. Great. But it does not help you for hardware failure. It doesn't help you for your flight system being scrambled or anything else, right? So it's just one more layer that helps you not have to go fetch your real backup. So if you're on your laptop and you want the file, you don't have to go and find your backup server and retrieve it, you can just go back and copy it from your snapshot. So it's a time saver, not a backup mechanism as long as everyone agrees with me on that. So the script, which I'm not gonna post here because it's a kind of long, but shows you the idea of snapshot in the file system and route is basically a subvolume I made which I then mount into slash. Daily is showing, and it already is weekly, so you're rotating snapshots and you can just say how many of each you want and then that way you can go back in time. Now, eight time, real time in snapshots. So if you have snapshots and you have, let's say, just 16 snapshots of your file system, every time I touch a file, whether I have eight time or real time, the first one will actually update the access time every time, the other one does it once a day. And when you do that, it means that your iNode now has to be copied because it's different from all the ones that were snapshots. And if you have already snapshots, every time you access a file, you're not creating more duplication of your data, which means the single fact of reading data will actually use up this space, which is not a very intuitive thing. So unless you really, really, really need access time, which most people really don't nowadays, just turn it off and then you'll be done. And just to be clear, real time is usually the default on most distributions and people think, oh, I'm good, I have real time, but it actually still updates the eight time once a day, which will still cause those problems. Now, there are bigger reasons for running out of space. The biggest one was one I mentioned very early on, which is that BetterFS does not always rebalance its data by itself. So you have data chunks and metadata chunks, and depending on how they're laid out on this, and whether you create stuff and delete in the wrong order, you can end up with basically not having a place to create a new metadata chunk in the wrong somehow, without having to do very long details. So in that case, you will need to do a rebalance. It's a pretty simple command. It can be done on the running file system. It is IO intensive though. And rebalance will go, it will find your data, and it will basically try to free up the partially used blocks, and copy data in existing ones, and then it creates, it basically defrimers your space on the chunk level. Another one is if you have snapshots, let's say I have a 100 gigabytes virtual box image, and then I copy it and delete the original. If in the process of copying, I fill up my disk space, let me go back one step. Let's say I have 80 gigs free, my image is 100 gigs. I copy my image into image.2, and after 80 gigs of copying, I fill up my file system. At this point, my file system is actually full. So I say, oh crap, okay. If I delete the original file, it will not reclaim the space, because it's in all the snapshots that have been saved. So if you needed to reclaim data by deleting a file, and that file has been snapshotted, you actually have to go find all the snapshots that that file is in, and also delete it there. Otherwise, those blocks will not be freed up, until the snapshots get rotated off. So that's one thing to know, which is not very obvious when you're not used to that way of working. The next thing is you can just say, oh, you know what? I don't really care about my snapshots. I just want to get back to disk with more space. Let me just delete all my snapshots. So make sure of course you delete, because the snapshots look like subvolumes. They look all the same. So make sure you don't delete the real one that you care about. But after you delete all the snapshots, disk is not actually reclaimed right away. Better if it has a background process that will actually go and reclaim those blocks, and it could take minutes. And sometimes it's taking me like half an hour to get every single bit back. You get some of it right away, but you don't get all of it right away. So I tend to run a wild command to just do the FI show that you have the top, and I can see that number changing over time. So that's just something to know. So the balancing issue we already talked about, so if you ever get hit by that, or before you get hit by that, you can go read up the wiki where I wrote up some details. Also, better if it's actually getting better at doing balancing for you now. So hopefully in the future, you won't have to do any of this. Compression, pretty simple. You just give it the mount option. The way it works is you can change it every time you mount, and every new file that you create after you change the mount option will be compressed within you, the new scheme. You can also do a rebalance, and rebalancing actually rewrites all the blocks, and by doing that, you could also recompress to a different compression level. So fragments. Things get a little bit complicated when you have disk images. Because of how copy and write works, if I have a 100 gig image, and I modify a block in the middle of it, what Verifest does, by default, they will write a new block at the end of my file, oh well, somewhere else on disk, because you still have the old block being used by snapshots typically. Also, even if you don't have snapshots, because it's again a copy and write file system, it never wants to write in the middle of your file, because if your write is incomplete, now you end up with half written corrupted data. So anything you write will always be at the end, and then maybe the bits in the middle of your file will get invalidated and freed up. What this means, however, on a big file system image, you will end up with a very fragmented file. So in most cases, it doesn't matter, but in a disk image, like a virtual box image, it will get fragmented badly. Turns out on the SSD, I had like many, many fragments I did not even know I did, but on the hard drive, you would pay that for that, ideally. So there's a CH attribute command to actually tell the RFS, please do not use copy and write for that file or that entire directory to avoid the problem that I just mentioned. It's also a defragment command, but I found that it doesn't, defragment, it doesn't work with a very big file, it's just too slow. So you better off copying the file into new blocks and then deleting the old blocks. That's something that will need to be improved because it should just do that better. Blocked data duplication, I did mention that. So there's a couple of commands that can do it. The one is offline, there's someone trying to put it into the kernel so it would actually look and find blocks and dedub them for you. It's not quite there yet, but what you can do when you're copying data is you can use the reflink command to CP, which basically means when you're copying, only copy the iNodes and then point the iNodes to the same data blocks. And that is, again, as I mentioned earlier, much better than having just hard links. Better RFS than receive. So that actually was the killer app for me on Better RFS outside of Snapshots, which who does backups with Arsync here? Okay, fair amount of people. You probably know that if you have 100,000 files or a million files that Arsync spends a day and a half scanning all those iNodes, then doing the same on the other side and then saying, oh, okay, now I know what to do. And sometimes it could take five minutes to copy data and an hour to actually scan iNodes on each side. So with Better RFS, you don't need to do that, right? It knows exactly what changed between a previous snapshot and a new snapshot. So there's no scanning required. And then it does need to say, oh, which part of that file changed or what do I need to send? It knows, oh, only those blocks changed because it has been keeping track of that. So it will send just those changed blocks to the other side and reconstruct the new file system on the other side with very little data copied. So that's what Better RFS and receive do. They're a little bit, takes a bit of work to use from the command line. So there's scripts that do that for you. I wrote one, there's a link there. There's other ones you can choose from. Now, so part of my title was doing server image replication. So I don't know if any of you were at my talk last year, but part of my talk was, one way to replicate servers as opposed to doing, sorry, one way to keep servers up to date is don't run app or yarn from cron because you always end up in a state where half ran machine got rebooted, someone modified files, and then servers in some inconsistent state. And at that point, the only thing you can really do is just delete the image and start over, which you can do in automate, that's perfectly fine. But one thing we do at Google, and we've been doing, because that was from a long time ago, we just copy an image on top on a file level. Now, if you know, again, our thing, which I just mentioned, with our thing that would be expensive and slow just because scanning all the I nodes is expensive. But if you can use better FS send receive, you can actually create a new snapshot on a server, which is very minimal data copy, which is what the new image should look like. And then you can point, just move your mount points to the new server image. And then very quickly, you now make your new live image on that machine. The other great thing is if someone's been modifying their live server in ways they were not supposed to, but that's okay because you just wipe it off having a new image be the new one that you're pointing to. So it's one way to do image replication, which in my opinion is much faster and more reliable than using other means. So yeah, for your personal use, I did mention I have two drives on my laptop. Again, I used to use Orsink. So I had my SSD and then my early crown job that would be and try to copy to the hard drive during which my mouse pointer was not moving nearly as fast, which is kind of sad, but still true. So again, with better FS send and receive, I have my early snapshots. It knows exactly what changed and just copies those block on the hard drive on the other side. And it does that in like a minute or sometimes even less. And that is just very fast. Then I just need to move a sim link on the other side saying, hey, this is my now, my user file system is pointing to this snapshot. And if I ever boot that hard drive, it would actually fold the sim link and mount the correct copy of the last snapshot that's up to date. And that is just so much faster. Again, I have a script I can do that for you. So you don't have to write it yourself. So yeah, I did mention doing that. Why SSDs? I did mention how they die. They're a little bit better nowadays, but if you think you can use smart for SSDs, don't smart, and many SSDs are, it's just utterly worthless. My drives work perfectly fine until damn they didn't. And of course, I wasn't home half the time. So yeah, just make sure you have copies and better if there's a good way of doing that. The other thing too, yes, I had the last bit I forgot is, just because one backup is not enough for me, then I backup that to my home server. And I used to do that with our sync, which again never worked. I'm in a hotel wireless, because especially with the latency, going back and forth saying, do you have this? I know, yes, I do. Do you have this? I know, yes, I do multiply by a million. It would finish by the time I got home. Now with better FS, it knows exactly what to push and I'm actually able to push images from my backup actually overnight on a wireless across the world. And that's actually kind of nice. That's in case I lose my laptop or something. So that was kind of the point here. So if you ever have to do backups of backups, it gets a bit complicated. I'm not gonna read the slide, you can basically look at it different ways of doing it. But you can do it with Snapshot R-Sync. You can do the CP link, which will do hard links first and then you can do R-Sync on top. And the last one is you can use a Brifelink, which makes use of better FS. So for RAIDs, there's a webpage you can read in details. But the thing that's interesting about Brifelink is that you can give different RAID levels for metadata, which is all your I-Nos and so forth, versus data. So you can make your metadata more redundant than the data itself, which is like, well, why would you do that? Well, I could actually make my metadata RAID 1 and then my data blocks RAID 0, which means if I lose a drive, sure, I'm gonna lose the data blocks that are on that drive that's gone, but at least my file system structure is still there. I can still run fine and see what files were there, which is better than having a scrambled file system where you can't even find out what you had and what you lost. The interesting thing is when you do RAID on better FS is you can change the RAID level and all you have to do is a balance command and balancing will actually take all your blocks and relay them out again using the new scheme. So by default, when I add a drive, they say I have a RAID 1 and I add a third drive, I can make it RAID 5. All I do is just say, hey, here's a third drive and now you're RAID 5. What happens is that every new file is now laid out in RAID 5 fashion while the old ones are laid out in RAID 1. So you actually have a mix until all your files get rewritten. And by doing a balance, you can just say, okay, just take all the data you have and rewrite it. That way it's now laid out in RAID 5 fashion. And that's the same thing that happens if you just add a drive. They say I have three drives and I add a fourth drive. I just add it, nothing really happens. Just new files will be laid out in four. If I want to have everything laid out again properly, I can tell better if it's to do that. And that's kind of nice because it means you can decide when you're gonna take the impact of rewriting everything. Or if you don't care about your old data, you can just leave it on three drives. You know it's gonna get deleted and all the new data is on four drives. So it gives you flexibility on how you're gonna do this. Shrinking is about the same thing. It basically rebalances everything to not use the extra drive you're about to remove and then you're done. Or if you have a parody, then you can do it from parody. So now if you haven't convinced you yet, now it's time for you to have a look if you have interest in any of those features. It's not something that you have to keep waiting for for that label of, in a criminal thing, it's not quite production ready yet. It is really usable for many, many cases. You just have to evaluate it before you put all your servers on it to make sure your workload and use cases find with it. But it's definitely used in production for many workloads already. And there's things again like Docker or Suzer that actually specifically make use of those features. Rate five and six, as I mentioned, you can play with them. I would not use production with them yet. Just put the MDADM rate five underneath if you need that. Blog dedupe needs a bit more work. So if you care about that, feel free to contribute. And 2015 is the year for you to evaluate it. So there we go. I tried to speak a bit quickly to give time for answers. So have questions. I'm sorry, I'll give answers. So what questions do you have? We're looking at using Postgres as a relational database. We've heard rumors that performance of databases on better-office file systems is quite poor. Is there any independent testing or do you have any benchmarks that are available? So the big thing is whether you have snapshots and the whole copy and write issue with file system images that I just mentioned, many databases will try to lay out data blocks on top of one another, which with better-office means you can have new blocks written and then you have fragmentation. So I'm not a Postgres expert, but depending on how it lays out the data, it may not work very well with snapshots. It will work, but your performance will suffer a little bit. But if you configure that directly, it doesn't have to be the whole file system. You can just configure the directory to be not copy and write. In that case, you'll have performance that should be similar to what you would have with AXD4. But I don't have benchmarks. I do use databases, but not in a way that's performance critical. Also, sorry, and the other thing I forgot to mention is a lot of them just, they don't even want a file system. They just want to write to the block device themselves. So if you really care about performance, the answers don't put any of that, just give them direct block IO access. Correct. So he's mentioning that if you turn off copy and write, it turns off by check summing of your blocks, which then puts you back to where you were with the AXD4. Next question. The steps on your recovery slide, are they in general order of preference, like smallest stick to biggest stick? Yes, I believe that I did put them in order of things to try. Correct. The last one being the FSCK one, because that one is destructive. Like the other ones are like, read-only recovery doesn't modify, right? The restore one, they will try to grab data out and copy it somewhere else. So again, whatever you get is free, then it doesn't modify your file system. The last one will modify it. I've got a couple other quick ones. Sure. You talked about removal of a volume and in order to shrink. Yes. Can you also do shrinking of individual volumes like reduce the size of them? So within a butterfly's pool, I call it a pool myself. I just think of it as a pool of blocks. Subvolumes are just directories. So unless you're using the quota subsystem, there's no limit. Any of those directories being, subvolumes can grow to any size up to the full size of the pool. So there's no resizing going on, really. Was that a good question? No. Can you resize the underlying block device to make it to move from the 100 gig file system to our 80 gig, let's say? Oh, right, right, right. You know what? If you can, that's basically saying, modifying the size of your block device underneath. Every time I've done this, I've just added a new block device and I added it to the same pool. So I didn't have to resize, I just gave it new blocks to write to. And the only time you would really modify a block device size itself is if you had partitions, which, as I said, don't bother anymore. Or if you had LVM underneath, which, no, please, I mean you can, but don't run LVM and ButterFS on top, there's no reason to do that. So the answer is, I'm not sure, but there would be no reason to do so anyway. Okay, and one last one. Does RAID 1 equal RAID 10 in ButterFS if you have two replicas, but more than two devices? Okay, so that one again? Yeah. So if you have a RAID in the standard RAID subsystem, you can actually have a RAID 10 device with three devices in it, for example. Right. And it will stripe those blocks across all of the different devices appropriately and still give you two copies of every block. Correct. Does ButterFS RAID 1 do that? Yes, my understanding is that ButterFS RAID 1 will basically ensure that you have two copies on two drives. You just don't have control of what those two drives are. And with MDDM RAID 1, you can actually give it three drives and have your data written three times. If you give three drives to RAID 1 on the ButterFS, it will use three drives but copy everything twice. That's my current understanding. And there are people asking on, say, hey, I want RAID 1, but I want three copies. But I don't think that has been added yet. Or if it did, it just got added. Next question? You said that there was a scrubbing tool. Yes. Do you hesitate the drive offline to use that scrubbing tool? No, not at all. I mentioned it was online at the beginning. So it's something you run nightly. There's no, you only think it's performance because it will obviously keep your drive busy doing that, but it happens in the background. Thank you. And then it will syslog errors. So you can, my script basically runs it and then it finds the errors and syslogs and emails them to you. Other questions? Over there? Hey, I was wondering, is there a way to determine what files are in a snapshot? You said before that you may have to delete all snapshots to gain that space again for deleting a file. Is there a way to determine using ButterFest tools that only those snapshots contain that particular file? So the snapshots are sub volumes. If you go back to the root ButterFest pool, you can go in each of those sub volumes and see if the file is present. So that would be the easiest way of doing that. But the thing about the whole game about how much do I gain if I delete a file and send mostly both snapshots, which may have different views and blocks for that specific file, that's kind of a hard math, because it's not the same file with not the same size in each snapshot. It's kind of hard to compute what you're getting back. But if you just want to delete a file in all the snapshots that's present and you can do that, you can also do that LS command with, you put a star for the subvolume name, then you give the whole path and it will show you how many copies of that file and how big that file is in each of those snapshots. So it gives you an idea of how much data could be used. If they all look the same, you know it's one copy, for instance. I think we're good. So enjoy lunch. Thank you for coming.