 So, hello everybody. I don't think I need to really introduce you to Kirk, but there it is. Maybe I'll introduce you to Marshal Koepman-Kuzik by deficient DBSD Hacker. He's going to talk to you about the design and implementation of DFS. Thank you. So we were working on this little book thing that you probably know about and I had originally thought that ZFS was just going to be sort of a section in the chapter that talks about file systems And then at some point one of my reviewers looked at it and said you only have six pages on ZFS That's not enough And I thought about it for a little bit and I thought yeah, you're right about that I guess I have to do a whole chapter on it and so we reorganized the way the whole book was done And there I was staring at a chapter then all of the text that I had in it was the title So how do you go about actually figuring this out? Well, there's got to be all kinds of documentation about I mean There's tons of stuff out there about ZFS But it's all about how to use ZFS and not about how it's actually put together and so I Found a paper that had been written for the Lisa conference But it hadn't been accepted and so there were some sort of copies of it floating around There's some slide decks that had been used to talk to customers about it And then I finally found the resource that I really needed which was Matt Arons who would actually answer my emails and actually If I bought him lunch would sit down and we could actually I could show him prototypes of things And then of course there's the quarter million lines of code that you can read if you really want to figure out How things work so I would try and read the code and I'd come to this lunch and Matt would say yeah Yeah, that's an interesting way of thinking of it, but no that's not the way it works so really that chapter came about because Matt basically led me through and and Made it happen. And so a great debt to him for doing that at any rate This according to Matt is the only complete documentation on how ZFS works and considering. It's only a 30 page chapter Is obviously just a brief introduction So anyway, I then had to put together some slides for my tutorial And in doing that I said well, I can just take some of those for this talk And I did that and then I did a dry run of the talk and it went on for an hour and 15 minutes So last night I whacked out four more slides and I'll try and go through it a little more quickly than my dry run So let me just give you the overview of ZFS The basic idea is that it's this sort of new generation of file system technology Which is the never overriding then the the copy on write if you will file system So once something gets written You don't go back and change it if you have your traditional file system like the fast file system You you know seek to a point of file and you write We just read the old thing in change the bytes you want to change and write it right back to where it came from So that's you update whereas in ZFS as you'll see if you if you modify an existing file That block that modified block will be placed somewhere else And then the i-node will be changed to go point to that new copy of the block all right Because it's a non overriding file system You don't have the problems that you do with the traditional file systems where it can become inconsistent You've updated some things, but not other things and So the file system has to be either rolled back using a log or a journal or FSCK or whatever your poison is that you happen to like In the case of a non overriding file system It is always consistent because it you have a consistent version of the file system And now you're going to move from one consistent version to another So what'll happen is you'll write everything out that needs to be changed and the last step is that you just You actually create the checkpoint that move forward by just writing that a new uber block in this case think of it as sort of a glorified super block and So either that right has occurred or it hasn't if it hasn't yet occurred Then you have the old consistent version and if the rate is completed then you have the new consistent version But you're never at a point where the file system is inconsistent Okay, so you just atomically step forward checkpoint by checkpoint It has things like snapshots Which are read-only clones which are read write so if you want to clone you have to take a snapshot and then you make a clone of the snapshot and you can then know modify away and If you know one of the common uses for clones is that you'll make a clone and do an update or an upgrade of the system And if it all works out then you just say alright, that's now the file system if it doesn't work out you say yep Just throw it away. Let me try again Okay With a non-overwriting file system is really easy to do snapshots. It's really easy to do clones You can have as you know piles of them in ZFS There are no limits other than the amount of disk space you have to throw at them ZFS has a lot of metadata redundancy data check sums We'll see that when we talk about the way The block pointers are implemented We have selective data compression and deduplication data compression and do deduplication Requires a lot of memory to hold the dedupe table if you just make the whole file system You know the whole file system deduplicated you can often blow out your memory and it gets very slow, so ZFS gives you the ability to selectively say what's being deduplicated what's being compressed and you can just do it For the things where it makes sense Unlike a traditional file system where you give it a certain amount of space when you create the file system and that's it with ZFS you have a pool of blocks and That pool blocks is shared among all the file systems and clones and so on that are running in that pool You have Mirroring and also single double and triple parody raid. I don't have time that I pulled the raid stuff because it just takes too long to explain Unfortunately, but it's in the book Space management you can put quotas on on users and you can also reserve space You can say make sure that there is this amount of space available to this file system so that You don't have like one file system go crazy on you and then everybody else dies a horrible death and There's fast remote replication in backups, which I also won't talk about All right, so let's start with the structural organization There's sort of Let's come on now You were working just a minute. There we go Okay, just whack it. Okay, so we have these sort of two main layers here We have what's called the meta object set layer and the object set layer the meta object set layer This is the thing that essentially is the pool So in traditional file systems all the block allocation and space map management is done down by the file system Here the file systems don't mess with that stuff when they need a block They just come up to the pool here and say give me a block and they use it and when they're done whether they just hand it back to the pool and so At the top of all this is an uber block and that's the the the thing that's actually taking the checkpoints And we talk about moving from a consistent state to from one to another We're in fact not getting just a a consistent state for a particular file system You get a consistent state for that every file system every clone every everything that's in the pool So what will happen is in essence when we get ready to take a checkpoint We'll go to each file system have them write out the stuff that they need to write out and when they've all when all the things in the pool have done that then we Write out anything that then has to be updated in the pool And then finally we update the uber block at the top and that's when the checkpoint actually happens And again, I'm going to show you a little example of how all that happens there Okay, so in these pictures when you see something that's just an arrow that is a single block pointer And when you see something that's one of these triangles That is a set of a blocks and indirect blocks and so on so all of these things that have the triangles on them are Can gross what of arbitrarily large? Just think of it it really sort of like an I know that allows files to get arbitrarily large You just keep adding indirect block pointers and Until you have enough to map out whatever you want so and then an object set is is the thing that sort of describes Is used to describe? Whatever this object is that's being drawn Okay, so at the meta object layer. We've got this this file if you will or this set of things The first thing in there is always the master and that's where we store various properties about whatever the object is so you'll see Pretty much always we always have a master thing at the front to store properties And that's things like for a file system. Where is it mounted and? Things about how the privileges are being managed etc Then here you can see each of the things in this is some particular Underlying either it's snapshot or file system or clone or a zeval a zeval is It looks sort of like a broad disk partition Okay, and then finally at the very end we have the space map and the space map is keeping track It's think of it as a big bit map would one bit per block So it's it's where we're keeping track of what space is available Okay, so in this particular example here, I've taken a file system so this again is just a pointer to another object set and This this thing the object set that we use for a file system is Essentially a set of inodes. So if you just think of it Logically, we've got all the inodes that make up the file system And they're just in an array. So you just index into this array. So here we have the directory files and bulk link etc and For a file, of course, it's going to have So of indirect block pointers to describe the data. That's the contents of the file Okay, so uber block anchors the pool the meta object layer here is going to Have an array of all of the objects all the file systems clones snapshots and so on then each of the of the Objects in here references the an object set which describes its set of Whatever it is it holds so in case of a file system all the inodes that reference all the bits and pieces that make up that file system all right, so block pointers and When we went from FFS one to FFS two This is where we realized that 32-bit block pointers weren't big enough to deal with large disks And so we haven't forbid went to eight bite 64-bit block pointers We have nothing over block pointers in ZFS the ZFS block pointers are the titanic of block pointers 256 bytes of Block pointer this brief block pointer in the file system Okay, so what can we possibly do with all that stuff? Well, the first thing you get here is this is where the redundancy can come in because you see we have three Different potentially up to three different pointers to something on a disk so a Reference to the disk here. We first of all have a 64-bit offset Actually 63 bits because there's this thing called a gang block if the disk gets too fragmented We don't actually have it all in one place then it's it's made up of some smaller pieces We have the device on which it resides grid is just saved not currently used and the the Sigh the size of the the thing that we're pointing at out on the On the disk this this you'll notice there's three sizes a size p size and l size a Size is how much actual disk space is being used including for example? Any raid parody blocks and you know all the other stuff that we need It's where you know how much actual physical disk are we using and So if we're going to have redundancy and we by default any metadata in ZFS has at least two copies made So it an indirect block will always have at least two copies of that indirect block and you can crank it up to three if you Want so the first copy is referenced by this first one the second copy is referenced by the second one and if there's a third copy it will be referenced by the third one here and ZFS carefully tries to Make sure that if you have multiple copies and and if you in fact have multiple disks that make up the pool that they will be on different media different disks So that if one particular disk goes down That you'll be able to pick it up off of one of the other disks Okay Level just tells us where we are in the in if What level of indirect block single double triple whatever they Check some here tells us what algorithm we're using for the check some you'll notice down here at the bottom We have the check some of the you know the contents of the block note that the check some is not stored with The rest of the block it is stored here And of course we need just one even if we have multiple copies because they should all have the same check some value And in fact if you have for example two levels or three three sets of redundancy you can You you pull in the first one and the check some fails You can pull in these other two And you know verify that the check some does in fact work And then use that to decide that this one should be updated Okay, so by not storing the check some where the data is stored essentially If the data somehow gets corrupted you because the check some is stored elsewhere You hopefully that the check some will not have gotten corrupted All right that you then have the physical size and the logical size These will normally be the same, but if you're doing compression The logical size is typically going to be bigger than the actual amount of space that you need To store the data because the compression has made the physical side go down Compression here just tells you what algorithm you've used for the compression We also have the birth times, and I'm not going to go into all of the details, but These times are not wall clock time rather they are which of the check points in Which checkpoint did this thing get created? So we start off at checkpoint zero when we first create the pool and then that checkpoint just keeps getting incremented Each time we do another checkpoint We choose to use the checkpoint number rather than time because if you took two checkpoints within the same second You might end up with These things matching and you don't want that to happen Okay, so Whenever I talk about a block pointer, I'm talking about one of these things and Because they're very large You want to be You would like them to refer to something fairly big Because of the overhead that you're incurring by having them so most blocks in ZFS are 128 K Now small files can use smaller pieces So if you're running on discs with 4k sectors a file may use just a single 4k Sector, so in fact this thing may just point to something that's 4k But typically when a file anytime a file has grown over 128 kilobytes in size It will be made up of some number of 128 kilobyte blocks okay, so Management of blocks as I already said the blocks are all kept in a pool and We have multiple file systems and all their snapshots and clones are also held in that pool And then blocks from the pool are given to the file system as they're needed and Then they're reclaimed back to the pool when they get freed Now it turns out that actually the freeing of blocks turns out to be one of them more difficult things to deal with in ZFS not difficult in the sense of It's hard to get it right, but it's it it takes a lot of code And a lot of rather interesting algorithms to be able to figure out when a block is really free because What will happen is you you first allocate the the file And then you've taken some number of snapshots and all of the snapshots are also going to have i-nodes that refer to that block And so just because you remove the file in the file system doesn't mean that we can actually free the block We can't free the block until the it's both free in the file system And we've gotten rid of all of the snapshots that reference it so That's sort of quick but not completely accurate way that I can describe this is with what are called dead lists And so when you remove a file We go through we take find all the blocks that are in that file and then we say okay are there any snapshots of this file system if so they'll be on a List sorted from the one that was created most recently to the one that was created furthest in the past and so We pass the blocks that we want to free down to the next snapshot And if that snapshot is still referencing the block it says oh well I'll hold on to that And if it's not then they just Trickles down until it's all the snapshots have had an opportunity to look at it And if none of them want it then it and only then does it get handed back to the pool and actually made free Okay, so we'd already talked you can reserve space to make sure it'll be there and you can impose quotas All right, so that same picture that you saw before I've now drawn in a bit more detail So again, we've got the uber block at the top We have the object set which you saw before Embedded in that is a D node and a D node to a first order approximation is What we would call in the traditional file system and I know it's the data structure that keeps track of certain properties about The node and also and primarily keeps track of the block pointers Now unlike the traditional file system where we have direct blocks and single indirect and double indirect and so on and ZFS we We start with a well We have a single pointer and the original so you have one direct block pointer And then if the file gets bigger than 128k, so it needs two pointers Instead of keeping the direct block pointer and then creating a single indirect block. We just promote The the direct block we allocate an indirect block and then we make the what was previously the direct block pointer It's the first entry in the single indirect and then we just start growing the single indirect and if we fill it up Then we allocate a double indirect we take that single indirect and make it the first entry of the double indirect it so on so a File either has a single direct block pointer or it has a pointer to a single indirect Block or it has a pointer to a double indirect block or it has a pointer to a triple indirect block It doesn't have that as we do in the regular file in the traditional file system All all three mix together Okay, so we have some number some number of levels of indirect block pointers here and Then in fact what we end up with is this array of these D nodes and then the D node has a sort of an area at the end of it that is Sort of free space that can be used for different purposes So depending on what it is that the D node references we put different things into that free space at the end of it We use a thing called a data set when we are referring to most of the things like file systems and snapshots Okay, so the we have the The original master node here and since it can have sort of an arbitrary amount of stuff in it We use the D node to you know scale that up to however big it needs to be For a file system or a clone this thing is going to point down to an object set Which actually has three D nodes in it So the two of them are used for the user quota and the group quota And the other one is used to describe the array of all of the I nodes or D nodes actually That are describing the files and directories and so on here You'll also see this little pointer off the side this Zill is the ZFS intent log One of the issues that we have with ZFS is that although the file system is always consistent We've had some number of changes that have occurred since the last checkpoint was taken and If the system crashes then when we reboot we the state that we get back is whatever the state was for the last checkpoint So if you start making other changes Unless you have some kind of a log you're going to lose those after a crash and the reason that the the intent log is is Particularly important is so that you can implement f-sync When you see how much work it is to take a checkpoint you'll understand that we can't Implement f-sync by simply taking a checkpoint a checkpoint is a big deal It takes a long time So you could do a few of those a second But you couldn't do hundreds of them a second and you may if you're running something like an SMTP server Be doing hundreds of f-syncs a second based on the rate of incoming mail And so we need to use this ZFS intent log to essentially log The the f-sync so that after a crash We will be able to start from the stable version of the file system and then run through the intent log To make sure that we get back at least everything that We agreed was going to be there Okay, so this is not a Traditional journal a journal only tracks the metadata changes. This is a full log So into the intent log here has to go not only the metadata changes that have happened But also any data because f-sync is committing data of course all right, so When you take a snapshot all you really do is just take a reference to this object here So over here you see the snapshot now You notice there's just a single d-node shown in this picture, but there are in fact three here The there's a frozen version of the group quota and a frozen version of the user quota And that's convenient because then later if you take a clone again, you just take a snapshot You know you take a reference basically to this object and then as it changes Of course, it'll create a new one and leave the old unmodified one behind so the snapshot doesn't change For Z vols the things that look like A disk partition It's a much simpler thing. It just has underneath it two d-nodes here One of which holds the master information and the other is the array of blocks that make up the disk partition So and then of course once you have one of these you can run a database on what it thinks is just a raw disk or you can Format a traditional file system in there if you want But you have the benefit that you can take a snapshot of your Z vol just like you take a snapshot of a file system Which is and make clones of it and you know you can do you have all the functionality that you normally get out of the Moss layer up here for your disk partition Okay Far so good. Haven't lost anybody not everybody Okay, so checkpoints are sort of the key thing here And so I'm going to spend most of the rest of my time talking about how you actually take a ZFS checkpoint So you start when when ZFS is running You're you're collecting all of the changes that are happening in memory And if things are you know a lot of writing is happening you're chewing through memory in a pretty good clip So at any rate you're not writing anything to the file system obviously you're just collecting it all together in memory As the modifications happen, so you're collecting the new data that's being written When you grow a file, of course you have to update the inode with the new size and potentially with new black pointers, etc And all of those changes the inode changes the actual data that's being done all that is just being collected in memory Not being written to disk okay, so Now you say alright I want to take a checkpoint and there's the reasons for a checkpoint is a certain amount of time has passed or you've gotten a certain amount of dirty data and That triggers a checkpoint happening or a checkpoint will also happen if the administrator comes in says I want to take a snapshot you start by taking a checkpoint and then you create the snapshot of that checkpoint okay, so you gather together all these things that have changed and You go find a chunk of available disk space and you write it and if there's a lot of contiguous space all Of the modifications that have been made get written in one big right and this is the reason that writing in ZFS is so fast because Unlike the traditional file system where you want to update Oh, well, you know I've got to go here and write the data for this file and over here right It's I know and over there do something with a directory and so instead of having these rights scattered all over the disk it's just full all in one place and Once you do that then You have you know all that that IO completes. You know, it's all there There's you know any raid Z stuff. It's all been dealt with then and only then the last step is that you write the uber block and The uber block is not just one uber block There's actually Typically several thousand of them and so but I did not have time to explain how we manage uber blocks But anyway the or uber block that represents this checkpoint gets written Okay, so the entire pool is always consistent because when we write that uber block either we haven't written it and we have the old Version or we have written it and we've got the new version Okay, so as I've already said the checkpoint affects all the file systems all the clones everything in the pool gets snapshotted at once Okay, and as I also said you need to log changes between the checkpoints in order to have persistence All right recovery starts from the last checkpoint so you find you come up you find that the uber block for the most recent checkpoint and You find the in the intent log and I mean you find the intent logs There's one for every file system and every Z vol and then you just roll forward through the log and this you know As you go through the log, it's like right this do that do these other things It just builds up a whole bunch of stuff in memory just like you would from normal Operation and when you've got all that done, then you do a checkpoint and say okay, boom we're now all caught up and We can reset the logs because we're ready to to to move on okay, so What actually is involved so in you know? I say we've got all this dirty data, but let's just look at what we actually have to do So this is what you would have to do If one file had one block added to it just to give you an idea There's nine things that are gonna have to be changed What we start out down here in step one is there's the actual new data that got written okay? That's not too surprising, but since we Have added another block to the file that means that let's say we have a single level indirect block pointer here We have to update that single level indirect block pointer with the new pointer You know with the pointer to the new data and we can't change the existing single indirect block So we have to make a copy of the single indirect block with the update made to it And if it was a double indirect block, we then have to potentially be the block above it would change because This has changed therefore the thing that points had it changed So you have to trickle your way all the way up through all the indirect levels until you finally get to the i-node and Now of course the i-node is pointing to a new block So the i-node or the new d-node has to be written And then that because this has changed that is going to change the effect of this file Which means we have to trickle all the changes up through all the indirect blocks Here up to the object set now the object set has a new pointer So the object set has to be rewritten the object set has been rewritten So therefore the file the thing that points to it has to be rewritten Which means this thing is changed so we have to change all the indirect blocks that go all the way back up to the top here that has now changed so we have to make a new copy of that and Then finally the last step is to point to that thing So we we figure out all the blocks that have changed all the way down from steps one through eight We gather all of those now modified blocks We write them out and once we we get confirmation that they've all been written Then finally we update the uber block So the uber block is the only thing that we ever overwrite in it in a ZFS file system Sorry Yes, oh, yes the point. He's pointed out that I allocated a block out of the space map So the space map has to change and if you Back on this previous slide here come on Talk to me you'll see at the space map is it's a file So, you know, we've changed one thing here So we had to change all the indirect things up to here So this changed and that that'll come in with this the change that happened here as we trickle them up through there So yeah another, you know Refor blocks have to be allocated and dealt with Okay, so the You can see why we can't implement fsync simply by taking a checkpoint The amount of work that we have to do the amount of space that we need to allocate is such that it would be Just way too inefficient to do that now It looks really bad because of all the things that trickle up here But supposing we had two files that changed in this file system There'd be a little a few extra blocks for that file, but then all the rest of the stuff trickling up here We've already had to change it all already because of this first file So it's you don't get this much stuff for every modification that occurred And that's why if you aggregate together a bunch of changes the overall the cost of this trickle up is not nearly as bad You know, we've already had to update the space map We've already had to update all of these things and so just updating one more file is not nearly as bad as The first file that we had to change all right so I just want to sort of Finish up by summarizing sort of the strengths and weaknesses of ZFS ZFS is not going to replace the FFS because it requires a lot of resources and It you know, it's very well designed for sort of large large pools of systems or large pools of data and Lots of file systems But if all you have is a little embedded appliance and you need a file system to sort of manage a small amount of stuff You're not going to do that with ZFS. I mean, that's just overkill and you don't have the resources to do it So for this sort of embedded appliance where you have sort of a usually a single disk drive FFS continues to be sort of the right solution where you've got large pools ZFS just blows FFS out of the water because it has all this redundancy and checking and other capabilities That FFS could only dream about Okay, so what are its strengths the high right throughput as I already mentioned instead of having to scatter stuff all over the place It's just getting chunked down in one place Because it uses raid Z which I haven't had a chance to describe to you but with raid Z essentially each block Since the raid is integrated in with ZFS Each block just gets its own Raid stripe and so you don't have partial stripes that you're filling you're always filling exactly The block and so the upshot of this is that when you reconstruct You don't have to go when you want to rebuild a disk You don't have to go through and reconstruct every block on the disk because you know which blocks are being used The way you reconstruct raid Z is you just do a walk across the the all the file systems in the pool and Figure out, you know, which blocks they are using and then you reconstruct those blocks So if you have a pool that has a relatively low utilization It's actually faster to reconstruct a raid Z pool than it is to reconstruct the whole physical media Unfortunately if your pool is mostly full it actually takes considerably longer to reconstruct the raid Z Because there's a lot of random access stuff whereas a traditional raid rebuild happens Sequentially across the disk ZFS doesn't have the right hole problem with raid The right hole is where you you're updating a stripe and you've written some pieces of it But not other parts of it and you have to have some envy ram or some other way so that the power fails when it comes back up You know how to finish writing that ZFS doesn't need to worry about that Because it's not going to change the checkpoint until it knows all of the stripes are completely written So it never has any block that's referenced by a file system. That is incomplete You can blue move blocks between file systems as needed You know how many times have you statically created FFS file systems and then wished that you'd put more in one file Blocks in one file system and then you didn't another it'd be really nice If I could just reach over into that other file system. It's got all that free space and borrow some of it But obviously you can't do that Whereas we ZFS they're just blocks in the pool and so if a file system needs a lot of space It gets it and if it doesn't need it it gives it back in some other file system can use it It's monolithic and I have a long list of things I don't like about monolithic but one of the benefits of monolithic is that everything's integrated together So the the those master nodes for example keep track of where things are mounted and what things are Exported and what the properties of those exports are so you don't have to maintain all these other files like et cetera fs tab and et cetera remote or it was export and ZFS just tracks all that it knows where things are and it just makes all the mounts happen Exports happen as they should so it eases the administration Okay, so where does it fall down? well the If you write a file slowly then its blocks are going to end up scattered all over the disc So think of a log That's being written over several days time The the blocks that make up that log are going to be all over the disc because they're written temporarily They're written as they're created and so if you now want to grep that log It's going to take a long time to run all over the disc to pick up all those pieces The way ZFS deals with this is it simply make sure that it has enough cash So that any files that you've read over the last day or so are just going to be in the cache And then in fact that they're not well laid out on the disc doesn't matter One of the reasons that ZFS wants you know eight sixteen gigabytes of memory to be available is to mitigate these problems I've already talked about reconstructing a nearly full pool Can go up to ten times slower than if you just had physical media The block cache has to fit in the kernels address space So in the traditional file system, we just map the blocks into the kernel when we need to look at them So you can have a 32-bit processor with 16 gigabytes of memory and you can use all of that memory as the buffer cache In the case of ZFS the way it's implemented it wants all of physical memory to be mapped into the kernel So if you're running on a 32-bit address space your cache can't really be bigger than about a gigabyte No matter how much physical memory is on the machine So the answer to this is if you're going to run ZFS just make sure you're on a 64-bit processor Free BSD will support it on 32-bit processors, but you will not be happy. So just don't do that Okay, if the pool gets more than about 75 percent full The allocations start to get very very painful and it's because it wants these 128k blocks and if there's not enough space to have those then it has to start taking smaller pieces and putting them together in these things called gang blocks and that's just painful so Don't don't really plan to run more than about 75 percent utilization The good news is if you see you're getting too high You can always just add more disks because that just adds more space to the pool And it can then just be handed out to all the other file systems So unlike a traditional file system where you say oh, I'm getting too full It's like well no just you can just add disk to it Which is something that not a property that you typically have with the regular file system Okay By contrast FFS will go to 95 percent quite happily and will go to about 99 percent somewhat less happily alright Raid Z is has a lot of good properties associated with it But one of the less nice properties is if you're using 4k blocks Then you're you have a 50 percent overhead if you're doing a single redundancy raid and so the 4k blocks are typically used by Z vols or databases and So you are going to have a high overhead if you choose to say use 4k blocks and Finally the thing that's really a pain is that the blocks that are cached in memory are not part of the unified Buffer cache. It's got its own little world with the arc And so if you're doing M mapping or Sun file you end up doing an extra memory to memory copy every time you read or write something Okay, and I'm being yelled at to be done. So I'll push the button and get the questions Any questions they do ask that you talk on the microphone so that people walk in Not in the room that are listening can can hear your your very insightful questions Surely somebody has a question So what about sim link performance and implementation? What about sim links sim links? Why what would be particularly difficult about them? I mean, that's just a path name that points to a file. So Yeah, but there were places in the In the headers You're talking about like having to use a whole block to hold a sim link. Is that the issue? The thing is that in UFS if the sim link is small enough It's stored inside the i-node and not as a file. Yes, okay So remember I said that the d-node had this extra space at the end of it So that extra space can be used to hold a sim link. So they pretty much do the same game as we do in FFS for that Anyone else I can tell you all one. Oh, no I think almost everybody in this room knows for the difficulty in in installing and adding devices to ZFS pools with for carbides physical sectors, I mean those advanced format discs It it is used No, no devices is are made and then they are added to the pool Otherwise ZFS reads the devices properties and starts to use 15 1500 bytes sectors. So in the next version versions are there any improvements in this? Well, I can only speculate on what the ZFS developers are going to do since I don't actually develop ZFS but Given the issue that you have raised I would be very surprised if that's not something that they plan to do Oh I'm told you have to speak There's a cctl called minimum a-shift that you can set so that you can force any device added to your pool to be treated with Block size of 4k Okay, and that you said when did that show up? It's already in 9.3 and will be in 10.1. Okay So Kerr can can we expect you join the ZFS development team anytime soon? Can you expect me to join the ZFS development team? Probably not When I was younger I used to take on projects you probably know about this because You do the ZFS and you think you're done and then they keep coming back to you and say well How about this and how about that and can't you do this and can't you do that so? This has happened to me I had early on I took on the fast file system the VM system and NFS and I've managed to shed the VM system and NFS thankfully, but I'm still on the hook pretty much for FFS But so I'm not into finding another project that I could be on the hook for This spoken by the person that did the original port of ZFS to free BSD. Let me say All right. Well, it's definitely time for us to have some lunch. So thank you very much. Thank you