 I like this disclaimer because it absolves me of all responsibility. So those of you familiar with the data fast may feel that I'm not fully covering some issues and that's but on purpose because this is for newbies. I can't. I can't go into detail. So you'll hear me say something and you'll say, don't, because it's not relevant to newbies. It'll be relevant to them once they've been using it for a while, but if you're newbie those little esoteric things that I've not mentioned or glossed over aren't relevant because I am grossly simplifying. So I will admit stuff and there will be options skipped because this is a talk for newbies. So what we're going to cover, we're going to cover all that. That's just a copy of what was in the program. So started as Sun Microsystems, 2008 that became a part of FreeBSG, and I don't really know when I started it. Sometime between 2008 and March 2010, I don't know where in there, but I just remember what I was doing at the time. Not listed in this list is the Sun, is the Oracle acquisition of Sun and ZFS is still developed in HEMS there. And most of the open ZFS development now occurs under the open ZFS project. This is stuff you can look up. Basically you're not going to run out of hardware before you run out of address space in ZFS. ZFS is not RAID. Don't think of it as RAID. Think of it as redundancy. Basically you put your drives together into a pool which we call a Z pool. You can create a mirror consisting of two to N drives. You can create a RAID Z which usually has a number one to three after it. That's the number of drives you can lose from that device without losing any data. I like to think of it that way instead of a traditional RAID because if I have a RAID Z3 sitting there, I can use three of those eight drives and the system still performs. Think of RAID Z as buying your time to replace the fail drive. It's not keeping your data. It's just buying your time to replace the drive before you lose your data. All of the above commands to create a pool come from Z pool create. A file system is part of a Z pool. So you put your drives in the box. You say, okay, these drives are going to be this Z pool and then in that Z pool you create a file system because you're drawing from your pool of disks to create a file system. A file system can have inherited properties. So if you say, okay, this Z pool, I'm going to put compression on and you can say everything below it is going to have compression on such as you're going to have say slash var is going to have compression on then var temp, var DB, whatever else is created in there by default will have compression on. You can override it at any step along the way. The big advantage of pooling your drives instead of giving a partition for each little part of the system that you used to do is that you don't wind up having no space at all on var DB, but 200 meg on slash user and you say, what am I going to do? You don't have to do a sim link from var DB, MySQL to slash user local MySQL, for example. This is a Z pool. This is a Z pool taken from a digital ocean droplet that I have that does a negative installation. This is the name of the Z pool, how much drive space I have was allocated, used. Fragmentation is not the fragmentation that you think of when you think of fragmentation. So forget it. DB is how much of this I've used, and yeah, it's online, and D-Doop, don't use D-Doop. Friends don't let friends use D-Doop. You'll hear about it and you'll want to use it, but don't do it, use compression instead. So this is this whole file system based on that. This is set up for BE Admin, sorry, boot environments. We'll talk about that later, but if you just ignore the Z-root bit, you can basically see the file system, which is more or less mirrored over here. You don't have to do it that way. I could create Z-root, Dan, and mount it over here under user home, Dan. So your mount points are separate from your file system names, usually your file system names all the way through here, and these are the mount points. So let me just make sure I have everything. The Z-root is the name of the Z-pool, a Z-root slash root is a dataset file system within that pool, Z-root, root default, this one here, Z-root default, is a descendant of root and is also a file system. So yeah, this is a file system, that's a file system, so is that. It does not have to be named root, by the way. It can be named whatever you want for this boot environment. We'll get into boot environments later. Legacy, this legacy bit here, it means that the boot loader, VFS dot root dot mount from equals gets set from Z-root colon Z-root default, basically in here. That's what the legacy means. And it's derived from the boot FS property of the pool, so each Z-pool you can say, okay, when you want to boot from here, boot from there, which means you can turn the bias off in your HBA so it doesn't iterate all the drives and it boots up faster. Because I generally boot from a separate drive connected directly to the motherboard rather than off the HBA. Okay, what's a V-deb? Again, grossly simplified because it's skipping over spares, logs, cache, and files, which you don't want to know about because this is a newbie talk. What's a V-deb? A V-deb is a single disk. A V-deb can be a mirror of two or more disks, or a V-deb can be a RAID Z, which we've already covered all these terms. So why do you need to know about a V-deb? Because you create a Z-pool from a V-deb, do I get that right? All right, so some of the terms I'm going to use, I'm going to constantly refer to file systems, I'm going to refer to datasets, they are the same thing. Don't get confused by that. So a dataset can be a file system, a volume, which is not commonly used, or it can be a snapshot. We'll cover snapshots later, but that's a read-only version of a file system. These are some interesting properties you can set on a file system. Set a time off. If you don't set a time off, what happens is when you read something, it's going to write something every time. So double your I-O, and a time isn't really as relevant as it used to be. Exact equals no is useful, especially if you put it on something like slash temp, because it means someone can't write a file there and then run it, which breaks I-O cage. Reservation means, hey listen, we should keep 10 gig spare on this file system, or don't let anyone go over 5 gig on this file system. So you give space to your users, they have a writable directory, and you set a limit on it. When it comes time to replace a failed drive, step one is the important bit. You don't want to remove the wrong drive. Some people will do a very clever thing which involves putting the serial number in the device name, which can be useful, because then you can see which drive you're supposed to pull out. So add the new drive in, remember I didn't say remove the failing drive, because your drive may be wonky, but your system is still perfectly intact, the z-pull is not degraded, and if you pull a drive out, your z-pull is degraded. And if another drive fails, then you've got an even bigger problem. So if you can, keep your existing drive in there until this process finishes, in which case the failing drive is no longer part of the z-pull, and you can just pull it out. So this is a system in which I did exactly what I'm talking about. This is an old case, I'm not sure if I still have it, but I might have it. I think there's three HBAs up at the top here, and so I had to replace a drive. This was actually me replacing three terabyte drives with five terabyte drives, and yes, that is just sitting loose in the case, and there is actually, this may have been before I attached the power cords and stuff, but I took one of my drives out of the drive base down here, and put it in there, and then put a new drive in here, and then started replacing the three terabyte with five terabyte one at a time. Don't use RAID cards. Just like I said, friends don't let friends do something. What was it before? DDoop. DDoop. Friends don't let friends do use RAID cards on ZFS. RAID hides stuff. So the RAID card would try to fix something, and fix something, and fix something, and say, oh no, this drive is dead, I can't do anything with it. And then ZFS says, what? Where'd my drive go? Whereas ZFS will try and fix the drive for you. It'll say, okay, this drive is having trouble reading from over here. Let me go over and check this drive, because it's weird, because I know it's going to be over here. Oh, it's there, it's just fine. Oh, I'll put this somewhere else over here, but it works. So ZFS will try and fix something if it can, and if it can't, it'll tell you that it failed. It won't silently give you a corrupted file, it'll tell you that it failed. So use HBA not RAID. Scaling. If you need more space, you can do what I did earlier, and upgrade all the drives. Replace 3TB with 5TB. Or you can add a new VDEV, we talked about VDEVs earlier. So you've got a mirror here of drives, you can just add another mirror. What you can't do is just take a RAID Z something and add new drives. So you can't go from an 8 drive RAID Z2 to a 9 drive RAID Z2, because math is hard. Like really hard to do that, but they are working on changing that. So you may have heard that ZFS is not expandable, that's the bit that's not expandable. It is expandable where you can say, I have a Z pool with 8 drives, it's a RAID Z2, here's another 8 drives, RAID Z2, RAID Z2, stripe them, and it's done, it's striped, you've now doubled your space. This is the single most important reason to use ZFS, even if it's only one drive. It checks everything, it puts it in metadata, and it's hierarchical. So basically, you write data down here in a file and it puts a checksum here, but then it checks some of it with the other data that matches up and all the way up to the top of the tree. It checks sums all the way down. So when it goes to read something, it checks sums the data and looks at the checksum that's already on disk, and if it doesn't match, it goes to try and fix it, but if it can't fix it, it tells you. But it'll tell you about the errors, and it'll look further and can correct it if it can. So the difference between that and a regular file system, or a non-check summing file system, is that you'll find out about the data being wrong instead of just being given this, here's this photograph, but one bit somewhere is wrong. Instead you'll be told, here's this photograph, and the data's bad. Not so much important in a file, sorry, in a photograph, but maybe in a file that has a bunch of accounting figures in it, and one bit is changed. So scrubs. What scrubs does is it reads all the data that you've written to the disk, and it reads it all and compares the checksums, and reports any errors that it finds, and attempts to fix any errors that you find. Always turn this on, and it's very easy to turn on. I do it every seven days. You can do it every month, however frequent you want to do it. Just make sure you don't do it more frequent than it takes to run. I've got some scrubs that take about 24 hours to run, but they won't overlap. Snapshots are the most fantastic thing for backups. One of the biggest problems with backups is that you will start your backup, let's say at midnight, and you'll be backing up all these files, and by the time you finish at 2.30 a.m., this log file over here doesn't match this log file over here, because this log file stops at midnight, and this log file stops at 2.30, because they're constantly being written. What you can do with a snapshot, because it's read only, is you can take a snapshot at midnight, and every single file as of midnight is in that snapshot. It does not change for the lifetime of the backup. So you take a snapshot, you backup the snapshot, you don't backup a log file system, you backup the snapshot, and it's an instant in time. Snapshots are usually very quick, so you do a snapshot, it's done. Yesterday I discovered a snapshot that started running on Friday the 13th. I had to reboot the system. Nobody could figure out what was going on. Snapshots cannot be modified. So they're good in terms of, I don't know if they would pass, say, some legal requirements for images of files, but they can't be changed. That's why I like them for backups. But keep in mind snapshots and the same hosts are not backups, but you can send a snapshot to another host. So you've got a host on the east coast, and you've got a host on the west coast, take a snapshot, and you send it to the other host. You can share a snapshot from any system to another system, or even in a system to within the same system. It's just an ability to say, okay, ZFS send, pipe ZFS receive, it's a little more complex than that, but it is a useful way to do backups. Send them wherever you want, because snapshots on another host are read-only and are a copy of the data offline, or offline relative to the first system. We talked briefly about mirrors before. Two more drives with duplicate content, but you can also stripe over mirrors. So basically you have a set of mirrors here and a set of mirrors there, and you stripe over both of them. It's sometimes referred to as RAID 10, because the RAID 1 is the mirror and the 10 is the striping over the 2. RAID 0 is usually referred to as striping, but in a stripe over two mirrors you can lose two drives so long as they're not in the same mirror before you die. So briefly talked about RAID Z, so you need at least four drives to do RAID Z, and that will get you RAID Z1, but the more drives you have the higher this number can come. And this is the beauty, you can lose n drives and still be operational, and by operational I mean the system will continue to run. It's not like, oh my god, the system can't run because I'm missing two, no it just runs. One of the most neat things I use on ZFS is mounting within mounts. So I have a bunch of slow drives for the main system, two old spinning disks, and that's what I boot from. They're directly connected to the motherboard, but then I have a database that I run on this system, so I add in a couple of SSDs for that database, and I create the Z-Pool on the SSDs, but then mount them in VARDB Postgres, which is just a mount point on the main system, and your code doesn't care it's on a different file system, it's completely hidden and it just works. You can also do this with slash temp, put slash temp on the fast drives as well. So here's an example of me doing exactly that for Pudrier, where I have my Z-Root with my smaller drives here, but then I have my fast tank with a lot of space here, and you can see that Pudrier is mounted at user local Pudrier, and that's just part of user, and it's completely transparent to anything else that I'm doing. This is also another way to expand the system. If you don't want to touch your original RAID Z, you can just put the space where you want it. You can do a whole bunch of mount points like this. Boot environments. A boot environment is a ZFS file system designed for booting, and there's a few requirements for that, but we're not going to get into what the actual requirements are for a boot environment. We're just going to show you what you can do with it, because you're going to like it. Basically, you can manage boot environments with BE-Admin or BEC-TL. BE-Admin has been around a bit longer. BEC-TL is now in base on FreeBSD, I think. BE-Admin is in port. Basically what it does is it saves your current boot environment, and the use case that we're going to use is we're going to upgrade from FreeBSD 11.3 to FreeBSD 12. What you do is you save your current boot environment, and what that does basically is it clones the environment that you're booting from now. You've done that, you upgrade your current environment, and then you reboot. And if it's all okay, great. If it's not, you reboot, and during the reboot process, you choose the boot environment that you saved away. Here's an example. See this magic option here, number seven? I press number seven here, and I get that, which is my default, and I change it to that, which is really difficult to read. But that's the other boot environment, and then I boot it back in to 11.3. So anytime you're doing updates, anytime you're doing major changes, you can use boot environments for something like that. You can also use it for booting new kernels, if you're doing a lot of kernel work. Next Boot is the neat thing that you can use for that, because all you do is you say, on my next boot, boot from this, and then the next boot after that is what you're using before. So it is truly just boot once from this other thing. So we're going to go through some very simple configuration things. Here we're creating a partition on DA0. We add a ZFS partition. This is just 4K alignment. This is a label, which is the serial number of the drive, and boom, that's what looks like there, and you've got the ZFS partition. And then what I do is I've done the same thing with my other drive, and I say ZOOPOOL create. This is the name of the ZOOPOOL, it's going to be a mirror, and it's going to be on those two drives. And bang, there's my ZOOPOOL. Now this is actually an older ZOOPOOL, because I actually did a scrub on this. You can see that it is called NVD. It doesn't match up with my data, because this is actually taken from one system and then modified to suit the example. Ignore that part. Basically, here's mirror zero, and if you had a stripe of mirrors, you could duplicate these three lines just down below here. We'll see that in the next example. So here we've got four drives, and we're going to create a RAID Z1. Again, it's going to be called myData, and it's RAID Z1, and it's four drives. What is that? RAID Z2, we just add another drive. RAID Z, we'll get to RAID Z3 next. But there's the RAID Z2. You can see the five, the six drive. One of them has a serial number in it. So there's the RAID Z3 with six drive. There's a RAID Z1. Again, you basically give it a name, say, I'm going to mirror those two drives, and I'm going to mirror those two drives. And you can repeat that process as far down the page as you have drives. And there's a RAID Z10, a mirror, a mirror, and it's striped over the two. Talk briefly about quotas. It's on the data set. It's the limit on how much space you can use, and it includes the descendants. So if you let your user create new file systems, it's all going to be included in there. The other thing is it includes snapshots, because although you take a snap, if you take a snapshot and then delete a five gig file, it doesn't free up the space because it's still involved in the snapshot. So you got to keep that in mind that snapshots, and when you make them, that space will stay around until you delete the snapshot. And I don't want to get into this stuff, because even I have trouble figuring out what it means. Remember to scrub. You can create a Nagios script for doing that. Also run Zpool status because that tells you if there's any problems. There are quotas and monitor the capacity, and there are scripts for doing all that easily available. Now we're going to go through some myth-busting, because there's a lot of bad advice that's out there. So I said this before, a single drive with ZFS is better than no ZFS at all. So just try it, and you will enjoy it. ECC is not required. People will say you got to use ECC with ZFS, but ZFS with ECC is better than no ZFS at all. You need high-end hardware. No, you don't. Most of my systems are consumer-grade. They're definitely not enterprise. You can get an HBA for about $100 off eBay. I do have some super microchassis because I bought myself birthday presents, and they were very much nicer than what I had before. When you're looking for hardware, look to the FreeNAS community, because they figured this all out already, especially for consumer-grade stuff. Lots of RAM. No, that's not true either. I have a system running with one gig of RAM, and that one system runs with 250 meg for you. That's the digital ocean droplet that I have. It's very slow when I go to the web page, but it's just for monitoring. I don't need speed. I just need it to monitor. This isn't a tip from last night. This is more like a tip from sometime in August. Put your OS on a ZFS... Put your operating system on a ZFS mirror, and put your data on the rest, because then you can actually... You could actually use UFS on the OS instead, but whatever you do, don't boot from the HBA. The reason I say that is, if you have your boot drives spread across seven or eight drives in your RAID-Z, the HBA is going to choose one of them to boot from, and that means you have to iterate through all those drives during the boot process. You can speed up your boot process by turning off the BIOS on your drive if you're booting from things directly connected to the motherboard. And the main reason I like booting from the motherboard is it eliminates any problems that you might have with the HBA. If your RAID-Z gives out or the HBA gives out, you can probably still boot off the motherboard. I just think it removes complexity from your booting. I got some tips from Savage Lake. She's here. I don't know if she's in the room, but she is at the conference. Oh, that's not good. There's this one button down here that takes me out of it. I've already mentioned this. Tell your BIOS to ignore the HBA. So if your drive's a scan, that goes faster. You can safely partition SSDs using the OS. So basically, you get your two SSDs, you get the OS mounted on. You don't need all that space for the OS. You only need, what, two, three gig for the operating system? You can partition other bits of that SSD and use it for other things. Create another Z-Pool. So basically, the same drive is involved in multiple Z-Pools. And I do that at home. Set your record size very big for files that have large data. You can go, you can get another five to ten. It was a significant percentage. I don't want to say how much it was, but it was a significant percentage. We changed the record size on some drives. When I restarted this, it lost my time. So basically, I've been going about a half hour now, right? We started at 10.30. So we covered a lot of stuff. I went through that way faster than I thought I would. Coffee helps. There must be questions because there's a lot. I bet you this can move. Just so it gets recorded. So two questions from Newby. You said that snapshots are good for backup, but they are read-only. So how do you restore? You just restore as you would and you restore it over the main file system. You don't restore into the backup. Yeah, but if you send your snapshot back, it's still read-only, right? Yeah. So you remove the read-only flag and you're done? No. How do you make it read-write again? What you can do is you... There's a hidden directory called .zfs at the top of every file system. And if you just want one file, you can do cd.zfs.snapshots. Whatever you name the snapshot, down to where you have the file and you can copy it over. Or you can do a .zfs rollback, but that rolls back everything. What I mean is suppose you make a snapshot of your dataset. Send it to another data center and in your original data center there was a disaster, so you lost everything. Now you want to restore. So you have a read-only snapshot waiting for you to restore. That's a good point. Is the thing that you sent actually read-only? No, I don't think it is. Because it's not a snapshot. Anyone done this already? What you've received is not read-only, right? Yeah. Yeah, same thing. Use the write, use the micropost. Repeat what's happening. Yeah, so you just do a .zfs send of your snapshot. Pipe it into .zfs receive. Is it doing anything? Closer. Closer. Right. You do a .zfs send over to your other system. .zfs receive, which unpacks it as a new .zfs in the system. And it's read-write on the other system. Thank you, Dan. I have a question regarding terminology. You said ZFS, but Z pool. Why? Why isn't there a Z pool? Habit. Habit. Z pool doesn't sound very good. And yeah, Canadian. So it is ZFS. I don't know. My vocabulary changes. I don't know. What do Americans say? Another question? So to keep the flame worse, and I mean the religious worse away, can you give us a brief comment or your comparison of ZFS with .zfs? Oh, that's easy. I've never used .zfs. When you were asking about the Z pool, why is that VDEV? Z pool consists of VDEVs, right? Right. I always have trouble remembering this myself. I just know that I create a Z pool out of .zfs. I would expect ZDEVs. The actual device, I think, is what it means. I have a question about the partitions. Before you add a drive to a Z pool, you partition it with GPT. And I don't know why, because the drives may have slightly different sizes or something. Linux tends to just give it a whole drive. FreeBSD tends to partition it and give it the partition. That allows you to do a lot of things. I think the main reason is because you can do a lot of magic with Geo if you have the partitions, but if you have the raw drive, you can't. The other reason is give it a partition. Don't use all the drive, so that then if you have to replace it with another drive, which may be slightly smaller, even though it's advertised as the same size, you can just reduce the partition a little bit and you haven't run into a bad place. More questions? So who hasn't used ZFS yet? Oh, that. So it's been around a long time. It worked. It's pretty solid. Once you get into the bigger drives, you don't have to worry about the f-sec at the beginning. No more questions. We're done. Thank you.