 Tom here from Orange Systems and ZFS is a cow. Well, not that kind of cow. It's a copy-on-write file system I mentioned a few times that this is one of the reasons ZFS is such a great system But today I wanted to go deeper and explain more about how that works and how it compares to journaling file systems Let's start with some fundamentals of the file system and how they work File system information can be divided into two parts data and metadata Data are the actual blocks records or any other grouping the file system uses to make up that file Metadata are the pieces of information which describe a file reference where it is and May also include such things as time and date created, you know Things aren't directly related to the file But the file system information that is important to you to know that the file is there and when it got there and who has Permission to view it. That's all the metadata. So there are two separate pieces here Now many of the file systems currently used in modern operating systems work with a technique called journaling A journal is a special log file in which the file system writes all of its actions before performing them If the operating system crashes while the file system is performing an action The file system can complete pending actions upon the next system boot by simply investigating the journal Examples of popular journaling file systems would include NTFS, EXT4 and XFS But journaling file systems have a big problem They can only provide metadata integrity and consistency because if they were to keep both data and Metadata changes inside the journal it would introduce an unacceptable performance overhead Basically writing all that data twice out to the system, which is why journaling file systems Choose only to log the metadata changes This way they can recover if that file fails to write because of a failure of the system like a reboot or a sudden power loss Now a copy-on-write file system improves upon this process and does both metadata and data consistency and integrity checking Consistency in a cow base file system is guaranteed using some transactional semantics These transactions all have to complete in their entirety this whole series of events more specifically if a Transaction succeeds all the system operations that belong to it are completed successfully then it writes if it fails It's as if nothing happened at all the data metadata in a file system is completely unchanged So it's only until this entire commit happens with all the transactions and checks in place that the data is written Now while the cow system provide full data consistency integrity is a different thing ZFS handles this by also creating check sums to ensure that the data does not suffer from Degregation and this is what ZFS scrubs do of note if you only have a single disk with ZFS and there is an integrity issue There's not always a way to correct it as there's not any other blocks But when you're using ZFS with like a raid z1 2 or 3 the system will attempt to pull that data from the other Disks to correct that corruption Now I want to move over to ours Technia because they have a good write up on this We can provide some visuals rather than Tom talk about this and you try to visualize it in your head All right, so now let's go for the visuals here in this really good ours Technica article called ZFS one-on-one Understanding ZFS storage and performance. I'll leave a link to this down in the description Now there's a lot to read in here But we're actually going to focus and start on the copy on write semantics the Transactional semantics of how this work are important somewhat complicated, but also why ZFS is a pretty awesome file system So consider a conventional file system as to overwrite data in place It does exactly what you ask it to and modifies each block just where it lies So you say hey, there's a file here. I want to replace it with the new version you save over the top of that file So in place the metadata then gets updated if it doesn't and there's a loss because the transaction wasn't complete Hopefully the journal has a copy of it because it was written to the journal first and then the recovery can happen So that's kind of the whole process of writing it if it works or writing if it doesn't work It recovers from the journal now. Let's go ahead and go the next slide here now Consider a copy on right file system as to modify blocks in place This is what it really does write the new version of the block then unlink the old version to the new one That's been written and behold they recall it the data comet There's some links to another demo here by Jim Salter here where they dive into and he calls the data comet because I Guess you can kind of see your leading trail there and the data comet And this is really important the fact that it's not unlinking until it's committed So until all the series of transactional semantics occur all the integrity checking everything that's within that transaction until it's committed The old files are not unlinked. That's like the final step So as I said if the power goes out, it's like the transaction never happened If there's a failure if there's a network drop anything that would happen to not complete the whole series of transactions This is how you end up with a file system that doesn't write corrupted data This also comes a little bit in play when people ask about do you need ECC to run a System that also uses the FS file system. It's good for error correction. It's not a bad thing But the ECC is just one more factor and it's not going to affect this This integrity check is done whether you have ECC or not This integrity check is what makes CFS or any copy on right file system keep its integrity Now let's abstract our visualization a copy on right if we ignore the real physical location of blocks We can simplify our data comet to a data worm moving from left to right across the map And this is a you know, just a different way to visualize what's happening here So you can see same thing the data is kind of trailing behind because of the way the transactions are now This is where people get a little bit of confusion when I'm talking about snapshots with ZFS But this is real important now We can get a good idea of how copy on right snapshots work Each block can be owned by multiple snapshots and will not unlink until a referencing snapshot is destroyed People ask this question a lot. Well, how much data does a snapshot take? It's a pointer it's a metadata pointer for the blocks and The snapshot locks that metadata So those other blocks that the snapshot is pointing at will stay there forever until you delete the snapshot This is why the snapshots technically take no room because they're in some ways They're like differential at the file system level to any changes that occurred after this is what ZFS does to make these snapshots work And also make them work very fast. There's a couple different methodologies I've talked about where you snapshot something how you can revert pretty much instantly back to it because you're just Relinking the current metadata in the main data set Relinked back to the point in time where that snapshot metadata was and because the snapshot metadata had the data blocks locked This would allow the system to go Oh, I just want to be at that version and it almost instantly It's extremely fast to roll back and forth between snapshots or to go a little bit further You can clone snapshots to new data sets where it locks that data and then creates a new data set Essentially a fork that allows you to have read write access to it Now for those of you that want to dive into some of the other things going on in ZFS that our technique article is great Like I said, I'll leave it linked in the description It has a lot of information on how the ZIL works how the replication works and these are all different things I've talked about in for example true NAS being really popular with ZFS implementations you can do a lot of these things on there And I've always said it's faster and this dives into why a couple things of note This is something that came up that people asking me about ZFS now that it's become the default for PF sense starting at 2.6 or 22.1 For PF sense plus it is default install for there and with single drives Is it still a good thing now as I mentioned single drives can't protect you from any bit rot because they wouldn't have the replicated data I know there's like an exception to the way it can be done, but that's off topic to where you can double right to a drive but Generally speaking, I don't know exactly how to implement that I know it's there's a way to do it not to get off topic It still has all the copy on right features This means for systems that are firewalls like PF sense if there were a sudden power loss The only way you could potentially lose data is if there was a power loss at the moment You tried to submit a configuration change But if you were to do so and the transactions did not complete all the way before sudden power loss You wouldn't end up in an in-between state You would just still have the previous config because it would not have linked the data to the next link and there for the copy on right Semantics, so this is one of the reasons that I think ZFS Possibly has a future in operating systems as a boot file system or copy on right and yes I'll give a shout out to butter FS. I'm not intimately familiar with it But it is also a copy on right file system I think ZFS and butter FS are probably the two most popular ones out there But this is why they are so popular those Semantics on how that works read through on this our technique to get a better understanding and of course all the other things that you can do With ZFS we kind of focused in on this But I wanted to get this covered because the question does come up a lot and for those of you And I think it's kind of funny I've been accused many times of just being too much of a ZFS fanboy or my favorite more than a person said this a Cult of ZFS, which I thought was kind of funny I'm okay with that and be a cult with integrity. That's something I think that matters on there So nonetheless, let me know your thoughts on this have a more in-depth discussion over in the forums and I'll see you there Thanks, and thank you for making it all the way to the end of this video If you've enjoyed the content, please give us a thumbs up if you would like to see more content from this channel Hit the subscribe button and the bell icon if you'd like to hire a short project head over to Lawrence systems calm and click The hires button right at the top to help this channel out in other ways There's a join button here for YouTube and a patreon page where your support is greatly appreciated for deals discounts and offers check out Our affiliate links in the description of all our videos including a link to our shirt store Where we have a wide variety of shirts that we sell and designs come out? Well randomly so check back frequently and finally our forums forums dot Lawrence systems comm is where you can Have a more in-depth discussion about this video and other tech topics covered on this channel Thanks again for watching and look forward to hearing from you