 I'm going to talk about ZFS and advanced integrations between it and FreeBSD. So my name's Alan Jude. I'm a FreeBSD server admin for 16 years now. Recently, I got more involved in the project, becoming a documentation committer in 2014, and then a source committer in 2015, and then was elected to the FreeBSD core team in 2016. I've also co-authored two books with Michael Lucas, FreeBSD Mastery, ZFS, and FreeBSD Mastery Advanced ZFS. So if some of the stuff here is a bit advanced and you want to start at the beginning, the ZFS Mastery book covers the very basics of ZFS and getting it set up and using it on your computer, and the Advanced ZFS book covers things like tuning it for running database servers and things of that nature. For my day job, I'm the architect of the Scale Engine CDN, basically a live video streaming company that I started. And I also host a weekly podcast, BSDnow.tv, where we talk about the latest news in all of the BSDs and interview developers and systemans and other people from the community. And we have 230 episodes in our back catalog now. So there's always something that you're interested in there somewhere. And I use a lot of ZFS myself with over a petabyte of storage that I manage by myself. So I've been doing it for a little bit. So for a quick overview, for some people that might not be that familiar with ZFS, ZFS is basically the combination of a file system and the volume manager. So it's software RAID that is intimately familiar with the file system. So it knows exactly which blocks are in use and which ones are free. So when you replace a failed disk, it only has to copy over the data that was in use. And it gets to skip all the free space and so on. And it solves the partitioning problem. Instead of taking your hard drive and slicing it up into a bunch of file systems and having to decide at the beginning how big slash bar is going to be and how big your home director is going to be, it takes the pool approach. It puts all of the storage in one big pool and you build many file systems on it. And they all take space as needed out of that pool. And you can also create reservations to ensure your log file system doesn't run out of space just because people are filling up the home directories with movies or something. Or quotas to stop a certain day set from using more than is such an amount of space. So this way you don't get the free space fragmentation problem where you have 50 gigs free in this partition and 50 gigs free in that partition and you have a 70 gig VM image and nowhere to put it. So it solves that as well. And importantly, any data you write to the disk along with it in the metadata that ZFS stores, it stores a checksum so that when it reads that data back from the disk, it can tell if the disk has flipped a bit or it got read from the wrong sector or something is wrong and the data is not actually what you wrote out originally. And because it's the software rate implementation, whether you have a mirror or what's called rate Z, which is rate 5, 6, or even triple parity, it can read from one of the other blocks and recover it. It also solves the split brain problem with a mirror. If you have a mirror and it turns out the same sector and two halves of the mirror have different data, how do you know which one's the right one? Well, if you have a checksum, you know which one's the right one and you can replace the incorrect one with the correct one and now both sides of your mirror agree again. Whereas with a hardware rate or even most other software mirroring, you would have to just pick one of the disks to be the winner and have it overwrite everything on the other disk until it matched. It also supports transparent compression. So before that is written to the disk, you can run it through the very fast LZ4 compressor, which can compress a gigabyte per second per core even on a laptop. So it's not going to slow down your writing to the disk, but if your data compresses two to one, like a base install of FreeBSD compresses about two to one, it means you can write twice as fast as your disk is actually capable of writing because you compress the data, then write. And it's all copy and write, so you get instantaneous snapshots and you can create clones where you writeable snapshot basically. And it means that in the event of a crash or a power failure in the middle of writing a file, the old version of the file is still preserved somewhere else on the disk and it can roll back to that instead of having overwritten half of the original file but not the second half and you end up with corrupt data. And as we said, because it's a combination of the file system and the volume manager, you create many file systems on top of the pool and each one has tunable parameters like you can enable or disable the compression or create separate snapshots or many other tunable settings that we'll talk a little bit about coming up. So a lot of what we're gonna talk about is based on the snapshots and clones you can create in ZFS. Because all data is copy and write, meaning instead of overwriting a file when it's modified, new space is allocated, the new version of the file is written over here and then the index is updated to point to the new version and if nothing else uses the old version, it becomes free space. But if you've taken a snapshot of the file system before you made the change, we keep the old version as well and now you have two copies of your file system, one before and one after, but it only takes up the space for the amount of difference there is. Any blocks that are the same between the two file systems are shared and you don't have to have twice as much space to store twice as much, two versions of every file. So blocks are referenced by a snapshot and when there are zero references left, it becomes free space and can be overwritten later. But the snapshots allow you to access what the file looked like at the time you take the snapshot. So if you have a production database and it's running, you can take a snapshot every 10 minutes and then you find out, well, five minutes ago, this table got accidentally deleted and then there's a snapshot that you can mount and get what the database looked like 10 minutes ago and you can undelete your data. The best part is unlike a lot of other systems like LVM, there's no performance impact for how many snapshots you have because you're always copy and write, you're always writing new data to a new place and then either freeing or not the old data. If you have one snapshot or 100 snapshots or a thousand snapshots, it doesn't affect how long it takes to write new data to the disk. And the worst impact it can have is listing all the snapshots can take longer if you have 10,000 snapshots but that's not something you do that frequently. And the snapshots don't really take any additional space. It's like 100 kilobytes of metadata or something but if nothing changes, no extra space is consumed. It's only when you overwrite a file or change a file, the original version is kept and the new version is kept and so that consumes the amount of space equal to the amount of difference you have in the files. But by using clones, which are basically a writable version of the snapshot, basically you make a second file system that is exactly what that snapshot was and you can change it, but any blocks that are still the same are still shared so don't consume any additional space. It allows you to fork a file system kind of like when you do a clone on GitHub. This way, you know, a popular thing is if you want to test and upgrade, you're gonna upgrade the database or the software that's gonna use the data. If you snapshot the file system and clone it, you can run the upgrade process on the clone and if it works, you can promote that to being the master or if it didn't, you can throw it away and clone it again and try again until you get it working. All this without actually taking up any additional space. So we apply that concept to your root file system. So if your root file system resides on ZFS, you can take a snapshot of it before you do an upgrade and create a clone of that. So now you have two root file systems, your live one and you have one called before I upgraded and then you install the upgrade and it works and everybody's happy or you install the upgrade and it doesn't work and from the FreeBSD boot menu, you select the before I upgraded clone of your root file system and your system is now back to what it looked like before you did the upgrade. But because ZFS lets you create many file systems, it means that you can have control over the granularity. So your home directory doesn't go back to before the upgrade. So you don't lose any changes you made there. You only roll back the file system that contains user bin and slash boot. But your mail spool, your logs, your database, none of that gets reverted. Only the files you actually want to roll back and the boot loader automatically builds a menu out of the most recent eight snapshots and allows you to roll back to any of them and you can also go back forward because they're clones, you fork the file system, you have both copies. So you can also mount it and cherry pick files out of it or fix it, whatever you want. And again, you don't end up taking up twice as much space for two installs of the operating system. The only additional space consumed are how much data was changed by the upgrade. So if it was just a security patch or something, it's only gonna be on the order of a couple hundred kilobytes or something. So there's an existing tool to help you manage this in the ports tree or from packages called BEADM. It's originally from the Solaris tool with the same name. However, this summer I mentored a Google Summer of Code project to build a new, so BEADM is a shell script and lack some of the features I would like to have that I'll talk a bit more about later in the presentation. So this new tool, BE and the library for it will allow you to do more complicated boot environments and do them via a library so that you can manage them for an appliance or from a web interface or from a graphical interface in your desktop environment. So the new tool will better support the file system properties and other settings you can have for the boot environments, but will also support the concept of a deep boot environment where instead of controlling just the root file system and reverting it, you can ensure that your user source directory has the matching source for the matching kernel in each of your different boot environments and have a whole hierarchy that can step forward and backward as you switch boot environments. But again, your home directory stays over inside and doesn't get versioned, so you always have your latest home directory. And part of the reason we wanted the library is so that we could hook it up with say PKG so that you automatically create a new boot environment before you run package upgrade. And so if package upgrade installs a version of X that makes your desktop environment not start next time, you can just reboot and choose the snapshot from before and now your desktop environment works again. Which is really handy when you update your laptop before say coming to FOSDEM and giving a presentation. So if you look at, this is the output of ZFS list which lists all the file systems I've created on my laptop here. You can see that there's a hierarchy called root and under it are a number of file systems. There's one called default. These two are actually snapshots of default at after the install and after I've installed some security patches. And then these two snapshots have been cloned to make actual file system out of them instead of just snapshots. And as you can see, the difference between what I'm running now and what was 11.0 P0 is a whole 1.75 megabytes. So it doesn't cost me much to keep that around for a while because it's only consuming less than two megabytes of disk space. But if I want, from the boot menu, I can select either of these versions and go back to them. And then the rest of the file systems on my pool. So I have a slash TMP directory. The USR directory is created there, but it's not mounted. This is so that the files that live in user bin and user S bin will fall through and live in the slash partition. And that's the one that we flip back and forth with the boot environments. Like if you look at the previous slide, this root default is mounted a slash. And these are the alternate slash partitions I could use if I wanted to. Whereas my home directory is a completely separate file system and it won't go back in time if I decide to boot a month old boot environment. Because I don't want, the slides are only saved in today's boot environment. So if I roll back a month, I need to be able to get the slides that only exist today. Slash FAR again is just apparent, it doesn't exist. Currently that's so that VARDB package becomes part of the root file system so that I can undo package upgrade if it doesn't go well. But my audit logs and my crash dumps and my log files, I want those to always go monotonically forward and not go backwards when I switch boot environments. And I don't want to lose my email. So one easy example of this is obviously this laptop. It uses boot environments and if an OS upgrade or a package upgrade goes sideways on me, especially I updated this laptop on Wednesday before I flew here, everything looked like it was working but you never know what it's gonna be the graphics driver decides it doesn't like the projector or suddenly the PDF viewer seg faults instead of starting. So I can just reboot, select the boot environment from Tuesday and everything works again and I can do my presentation. But then I can also go back forward tomorrow and figure out why the PDF viewer is seg faulting. And like I said, it preserves my home directory because my slides have only existed for a small amount of time. But sometimes you want to go further so this is all what exists today. You can have this in FreeBSD11 and it's all happy. But what I would like to do, taking it forward is this concept of a deep boot environment. Some users or developers have more complex needs and preferences in particular, instead of just having FreeBSD11 on my laptop, what if I'm a port person and I wanna test the same software on FreeBSD10.3, FreeBSD10.4, 11.0, 11.1 and head. I can have a boot environment for each of those and easily switch between those versions and again keep my home directory. But I'm gonna want the user source to match because I can't compile the virtual box kernel driver without the matching source code for the running kernel. So we'd actually like user source and probably user OBJ to live as part of the boot environment instead of being separate file systems. So when I switch, I get the matching source tree. So there's a RC script that exists for this called ZFSBE that will mount the boot environments that are children of the file system you select as your route. But the management tool BEADM doesn't know how to recursively clone those extra file systems. And so that was the idea behind writing the BE tool as a GSOC. So what this would actually look like is I have another boot environment called newest which maybe has FreeBSD head in it and then I can also clone that. And again we see well to the one copy of the source tree cost me 1.34 gigabytes. A second copy of it cost me zero bytes because there's no difference between those copies right now. But if I change one file here you'll see this dataset go up by however many kilobytes that is for the changes I made. So how we've taken to using this where I work is actually building a boot environment once in a clean environment and then shipping it to 100 servers and rebooting onto it. So we basically make up an image with all the latest security fixes and any customizations we want. And then we use ZFS's replication feature to serialize that file system as a stream. In this case pipe it into XZ, compress it and put it out to a file. We throw that on our image server and then on each host we just download that file on XZ it and pipe it into the ZFS receive and now that exact file system exists on the server. And then currently we mount it under slash TMP temporarily, copy over some of the config files so we don't lose master.password and a couple other files. And then we can use the ZFS boot config tool to say the next time only boot off this boot environment. So instead of changing the default one that is gonna boot every time we say just the next time boot off the new one. If it doesn't work we just power cycle the machine and it boots back to the default one and the upgrade fails. But if it does work we can burn that in and say that is now the default boot environment and it'll boot that every time. How we'd like to enhance this is we've created a new data set called slash CFG and added an extra line to our loader.conf that mounts that before it runs ETC RC. In this way we can have some links in our ETC directory that say RC.conf is actually over in this slash CFG directory or something so that we ship a boot environment that's the same on all of our servers and the little bit of personality that each server has lives in this other data set that'll persist through the upgrades as we switch versions. And it saves us from having to use merge master or ETC update or something to try to merge changes into the ETC directory. Luckily most of the tools I have to deal with in FreeBSD now support including other config files like new syslog and syslog and cron and everything I can create those in a separate place and have them included meaning I don't have to modify the default FreeBSD config files at all. But this means that I can do either a minor or a major upgrade to a remote server in just a couple of seconds. I receive the image, slide it down and then say reboot onto it and the machine reboots comes back and I have the new version of FreeBSD. Then I maybe have to run a package upgrade if I've done a major version upgrade but then the system upgrade is complete. So what we'd like to do is replace or provide an alternative to NanoBSD which is a tool in FreeBSD that was used previously to build appliances like FreeNAS, PFSense and so on. The way it worked with UFS was you take your hard drive and partition it in half and you'd have two images and you'd boot off the one and then when you install an upgrade or a newer version it would put that in the second partition and boot off of it instead and if that fails it would fail back to the first one and you'd basically keep swapping back and forth as you're upgraded always having your one older version to fall back on. But with ZFS you can have the last 10 versions to fall back on if you want. And both FreeNAS and PFSense have actually switched to this in their latest versions using ZFS and boot environments to provide that feature because now you can have as many old images as you want instead of only one. But you still get your firmware style updates. Plus you get all the extra features of ZFS for free when you do this. So what we'd like to do is enhance the next boot functionality. So currently ZFS Boot Config allows you to write the name of a data set into a little slot in the ZFS metadata and when the boot loader sees that it mounts that data set and then erases that so that it won't do it the next time. What we'd like to do is configure it with a feature they're developing in the Lumos where if the system fails to boot three times in a row where it keeps it counter on the fourth time it'll actually boot a rescue environment. They use this on Amazon AWS because they have no console access. So if their appliance won't start three times in a row it boots into a different environment that they can always SSH into and then mount the broken system and figure out what's going on. You know with Amazon where they don't have any other recourse to figure out why the appliance won't start. So we'd like to offer that and a couple other optional features where we have more than just unstructured data saying hey boot this environment we'd have some kind of a structured thing where we could support both next boot and fail boot and all the other features we might want. With ZFS on FreeBSD currently you have two options for disk encryption. The first is FreeBSD's native block device encryption which supports AES XCS and AES CBC. This basically encrypts a full block device with one key per disk and it supports booting from encrypted data sets using instead of having to have an unencrypted slash boot partition with the kernel and the modules and then the root file system encrypted. With the new version only the 128K bootstrap file needs to be unencrypted. So you have a small 128K partition with the GPT ZFS bootstrap file in it and then a ZFS pool that's completely encrypted and it boots, asks for the passphrase and mounts the encrypted file system. EFI support was supposed to land in December but will probably land in March instead but this will allow you to do the same thing when booting from EFI rather than legacy. Currently it does require you be at the console to type in the password. Moving it from the bootloader to the bootstrap which is the thing that loads the bootloader means that serial's not available unless you have biocereal redirection. But regular serial part serial isn't available because before the bootloader actually has that support. One thing that we're looking at addressing hopefully before the end of this year is support for a USB stick with the key files on it and to support unattended reboots of a server but currently that doesn't work. Option number two that's not available in FreeBSD yet but is coming is native encryption in ZFS. With this, not everything is encrypted but the user data and the file names are encrypted and this uses AES GCM or AES ICM. Part of the reason for that is you get the free checksum with the encryption that way. So not all the metadata is encrypted. Pool Y information has to be not encrypted so that the pool can find how to decrypt stuff. But this allows you to use, make the decision per file system whether you want it encrypted. So you can say I only want my home directory encrypted not everything or vice versa. The other nice thing about this compared to full disk encryption is because you have this options and keys per file system you can actually ensure that data is at rest, right? Full disk encryption isn't very useful if the machine's booted because you've already mounted the drive and it's decrypted now but if you have many file systems and each one has a separate key you can unmount a file system that you're not using right now and now the data's at rest and it's actually protected by the encryption even though the machine's still on. Other nice thing is what they did is split the metadata where we store the checksum in half. The first half is the checksum of the clear text and the second half is the checksum of the cipher text. This allows the scrub and resilver of replacing failed disks to be done without the encryption keys being loaded. So the storage administrator doesn't have to have access to the encryption keys in order to keep the system up and replace failed drives. You can resilver encrypted data without having to have the encryption keys loaded which makes everything a lot safer. And you can have different keys for different data sets and you get the regular inheritance if you want. Again, this is available in ZFS on Linux and should be available in previously in Alumos once it passes some more code review. If you're building an appliance or something on FreeBSD, there's another interesting feature that's available now called channel programs. Before now, each ZFS operation you might do like creating a data set or destroying a data set or renaming a data set is a transaction. And so that requires all the pending data to be written out and then close the transaction group and open a new one. If you were trying to do, rename 18 data sets in a row, you have to do each of those separately and it can take 15 seconds. With channel programs, this is a little bit of Lua in the kernel. Basically, you create a short script that the operating system can then run so it can iterate over a list of data sets or input and perform all those operations in a single syncing context. So it becomes one atomic transaction instead of 18 transactions. So they created this at Delphix, one of the upstreams of ZFS to handle upgrades of their appliance where they need to rename some of the data sets out of the way and then make the new data sets and they had this all scripted. But the problem was if there was a failure halfway through this, they'd have to try to undo all the renaming and figure out how much stuff they had done. Whereas this channel program means they do all the operations and it either succeeds or it fails. There's protection against bad things happening by having limits. So while you can run this semi-arbitrary Lua script in the kernel, it has a limited number of instructions so it can perform before it'll be aborted and it has a limited amount of memory. So you can't create an infinite loop or something. There's a limited number of scripts that are built in right now. It doesn't support you loading custom scripts at runtime. I probably will eventually, but right now, there's a small library of scripts that are built in and you can do things like recursive rollbacks, rollback a whole hierarchy of file systems. And again, for the upgrading process, there's another feature called Zpool Checkpoints. The idea here is it's like a pool-wide snapshot. So it backs up the top-level metadata of the whole pool in this checkpoint and it suspends all freeing of space. So once you create the checkpoint, any disk space you free doesn't actually get freed yet. But you create one of these before you do the upgrade process on an appliance, especially a headless appliance. You make all the changes, rename some datasets, destroy some file systems, whatever you wanna do. And then if your upgrade process fails somewhere in that process, you can just re-import the pool at that checkpoint and all the operations you've done are undone. So these are not meant to live long-term because they cause no data ever be overwritten so that you'll be able to go back. But it allows you to basically save the clean state, try the upgrade, if it doesn't work, you can undo any of these operations you've done. Even adding whole new disks can be undone. So it's a great way to safeguard against a mistake when you're trying to add new disks to your pool. You just create the checkpoint first and you'll be able to undo anything that goes wrong. And when you go back, it's as if it never happened. So when we get to the end for questions, I'd like if anybody has any ideas on what would make ZFS work better for you, whether it's as a laptop or a server or an appliance. There's a number of interesting features coming up soon. The FreeBSD Foundation, Delfix and IX systems have partnered together to bring the most requested feature in ZFS to us. We hope to have it by the end of this year, which is RAIDZ VDEV expansion. So if you have a RAIDZ 2, like a RAID 6 of six hard drives in your home NAS or something, this feature will allow you to add a seventh disk and get that additional free space without having to rebuild the pool. It's the single most requested feature in ZFS and with the collaboration of our partner from Illumos and FreeBSD, we'll hope to have this by the end of this year. And then I have a couple of slides that are rundowns of other interesting features in ZFS that are coming up. First one is Zed Standard Compression. This is the one I'm working on myself. Facebook has a new compression algorithm called Zed Standard. It's similar compression ratios to GZIP, but somewhere between four and eight times faster. So by integrating that with ZFS, compared to LZ4, the fast compressor that ZFS defaults to now, it gets about three to one compression on a base install of FreeBSD. We set it two to one, but on higher levels you can get even better than that. When combined with the compressed arc feature where data that's compressed on disk stays compressed in the cache in RAM, means that on a database or something of that nature, you can fit a lot more data than you have RAM in your RAM cache. So suddenly a 40 gigabyte database compresses to 10 gigabytes and you can cache the whole thing in 16 gigs of RAM and get speeds that you couldn't think of for a database before. One of the interesting things with Zed Standard is it has 21 levels of compression. So you know how GZIP has one through nine, this has one through 21. One of the interesting things you can do with that is what's called adaptive compression, where you modulate the level of compression you're using to keep a pipe full. So if you're going to, you know, cat a file and pipe it in Netcat, you pipe it through Zed Standard with the adaptive mode and it will change the compression level to keep Netcat full, but not spend time compressing for no reason. So it finds the compression level that keeps this full but doesn't overflow the buffer and that way you get the same thing. So you could do that in ZFS where if there's not a lot of data pending to go to disk, we can spend lots of time compressing it. But if we've got hundreds of megabytes piling up and we really need to start sinking it out to disk, we can back off the compression rate until we find the medium where we're getting as much compression as we can by keeping the disks busy. There are a number of improvements to the Resilver code coming. The first one is sequential Resilver. Currently, as I mentioned at the beginning when you resilver a disk in ZFS because it intimately knows the file system, it only has to copy data that's actually written out. It doesn't have to copy the free space. But it scans this in the order it is in the metadata. So you get a lot of random reads in the Resilver and it can actually end up being slower. So to fix that, as it scans the metadata, it builds this range tree and up to a couple hundred megabytes of RAM. And once that's full, it takes the largest contiguous block of work and performs the Resilver of that and then continues scanning until it's filled up the RAM again. And it keeps doing this and that way you get nice long, contiguous sequential reads and writes and much higher performance on the Resilvering. That work is being done by Nexenta. And then the Smart Resilver is a feature coming from the ZFS and Linux people which is a smarter prefetcher for the Resilver process. When those two are combined, we can expect four to eight X improvement in the time to Resilver a fragmented pool, which will make a big difference. There's also performance enhancements coming from both FreeBSD and Delfix for the ZIL, which is the ZFS Intent Log. Using one of the other features of ZFS, you can offload the write cache to a high-speed device like an SSD. So synchronous writes from a database can be flushed to the SSD and let the database transaction complete and continue. And then later are flushed out to the spinning rust as a large sequential write. Some performance improvements going on there now that we have NVME devices and we can have 64 concurrent transactions going on. There'll be a lot of improvement we can get there. There's also work from Lawrence Livermore National Labs to make a safe Zpool import for clusters. So if you have multiple heads that have access to the same disk, this feature will make sure that you don't accidentally import the pool or mount the file system on two of the heads at once. And lastly, device evacuation, which is the ability to remove a mirror or a striped disk from the pool and basically move all the data that's on there to other devices so that you can actually take a disk out of a pool. And one of the other interesting ones is the a-shift policy change. When you create a pool in ZFS, the default was 512 byte sectors, but you can either force or detect space on your disk that if you have 4K native disks. The a-shift policy is to deal with pools we created in 2011 when disks were 512 bytes, but now all the replacement disks I can buy are 4K. Setting an allocation policy is all new space that's allocated should be aligned to 4K so that we won't get the read, modify, write penalty on the newer drives. So, questions? Oh, it's X. You have this kind of tiling shim into the team. It's actually going back. You did something similar for Open Solaris called Time Slider. Did there any plans to provide something like that? Yes, in TrueOS, there's a tool called Life Preserver that does the same thing. And I think that's available in the FreeBSD repository as well, so you don't have to use TrueOS together. But it's very much the same idea. Oh, sorry. The question was, is there a tool like Apple Time Machine or Solaris have one called Time Slider to use the snapshot feature in ZFS to be able to serve backward and forward and get your backups? And the answer was that yes, in TrueOS it's called Life Preserver. I think it's in the porch tree for FreeBSD as well. Next one. Yes, in the previous slide you mentioned that there is some good work on increasing the performance of the application. Does that imply that also the resource usage will be recorded for that or will be less? So the question was about the Ddupe performance improvement. Currently, this is just a design that Matt Ehrens came up with of, instead of currently the way Ddupe works is there's a hash table in memory and then that same hash table structure exists on disk. So every time you make a change there you have to write out three or four sectors worth of data and the performance can be really bad and the hash table ends up usually not fitting in RAM. So there's two parts to this. The one is switching to using a log for the Ddupe table on disk. So when you, for a certain hash you say we're gonna increment the reference count for this hash by one or decrement it by one as blocks are added and removed from the view and then only once that log gets so big will it be compacted and written out as a hash table again. And then another part of that is an opportunistic enhancement for the hash table. When you'll be able to, it's called the Ddupe cap or something, I forget, anyway. You set a limit to how much memory you wanna dedicate to the Ddupe table and as that gets full, what it will do is when you wanna add a new entry but there's no room in the hash table it will look for a hash that has a ref count of one meaning there's only one block that matches and remove it from the Ddupe table and put in your new entry. Normally this would be a problem because then what do you do when you wanna free that block but because we only do it with ref counts of one it means if we go to remove a block and we search the hash table and it's not there we know that it had a ref count of one and can remove it. So this allows us to files that have, or hashes blocks that have been around a long time and have never managed to Ddupe for us. We remove them and put in a new block in hopes that a freshly written block has a higher chance of eventually having a duplicate written of it. So this allows, with the Ddupe ceiling, that's the name of it, Ddupe ceiling feature you can limit how much memory will be used by the Ddupe table and that solves most of the resource problems as is and then the log feature solves the disk IO problem of Ddupe. But again both of these are just ideas that the co-founder of ZFS has had and presented at the Dev Summit for someone to use but his company doesn't use Ddupe they just use clones of databases so he's not actively working on it. Any other questions? You have one? At the back? So the question was the example of a laptop breaking right before a presentation was that a true story? Yes. Yes. Yes. 2016? Yeah. I no longer have to carry two laptops in case that happens. The question was does anyone in the ZFS community done performance analysis on the mitigations for Spectre and Meltdown? Not yet. In general those are syscall level things so it depends on the level of how many syscalls you're making so in general probably an increase in latency there might actually impact databases relatively heavily but regular file server operations are probably not gonna be impacted that much. So it depends how you're using ZFS. Really it's workload dependent and each OS's mitigations for Spectre and Meltdown are different so. I think we have time for one more question. For yourself, does ZFS spreading come to different operating systems? There were some news after the latest Dev Summit that there were some Windows machines running? So the question was about ZFS spreading to other operating systems? Yes, at the OpenZFS Dev Summit which was in November the developer who runs the OS 10 port gave a demo of OpenZFS ported to Windows. It has since been enhanced more it supports the replication so you can send and receive to it and it also does the individual file systems don't each take a drive letter anymore so if you have 40 file systems you don't run out of drive letters. It's still not considered production quality at all it's a pre-alpha demo but there's a GitHub repo you can check it out and it's all native Windows API stuff it's not using the subsystem for Linux thing it's native Windows APIs for the file system. So we might finally get there to have the universal file system that works in every operating system. The question was do zones or jails have the same priority of access for IOPS and performance than the host? By default yes, on FreeBSD we have a research control system called RCTL where you can set a kilobyte per second or IOPS limits per jail so you can say the certain jail can only have 50 IOPS or something but it's not quite queuing or priority levels. Is not the same as the host? You can't, the question was can you I guess change the size of the cache, the arc for each zone and no. That might be interesting. The accounting of that would get really complicated especially since with clones and stuff the same block could be used by both the host and the jail and then who pays for it? And yeah I think we're out of time. Thank you.