 So let's dive into talking about storage, iSCSI, NFS, and virtualization. So getting performance out of this, there's a lot of factors involved. So I wanted to talk about a few of them, because one of the first things people run into when they start doing things like setting up an NFS share, and generally NFS becomes very popular and is not a bad way to go, but you right away run into some of these issues and questions and test configurations and diving deep into this. So we'll start with this article here. And by the way, I'm gonna leave a link to all three of these articles. It's up to you if you wanna read them before or after you watch this video, but they're going to have more depth on some of the really specific details inside of here. And I may or may not cover some of these. I wanna show you some of the practical side of the knowledge I gain from reading all of these articles. So we'll jump right to the summary. NFS by default will implement sync rights as requested by ESXi clients. Now you can insert ESXi, Proxmox, whatever your hypervisor is, in my case it's gonna be XCPNG, but Citrix, you could fit that one in there as well. It's the way the synchronized rights happen with NFS. This is one of the reasons that ISCSI can appear faster. Well, it technically is faster, but just comes back down to that whole integrity versus speed and what is an acceptable amount of risk when it comes to your data. So the options are doing NFS with sync disabled. We're gonna test that. Some people use sync disabled NFS to gain speed. This causes asynchronous rights to your VM. And yes, it is lightning fast. However, in addition to turning off sync rights for the VM data, it turns off sync rights for the ZFS metadata. So this actually doesn't just affect the VMs as in the connection between the VM and where you're storing it. This also is turning off syncing for the ZFS metadata or the ZIL. So this may be hazarded both your VMs and the integrity of that particular ZFS pool. So those are risks. And we're talking about cash traffic failure like absolute power loss or drives failing while data is in flight. ISCSI by default does not implement sync rights. As such, it often appears to be much faster and therefore much better choice in NFS. However, your VM data is being written asynchronously, which is hazardous to your VMs. On the other hand, ZFS file system pool metadata is being written synchronously, which is a good thing. So we're gonna explore how that works in practice. This also means you probably know a way to go if you refuse to buy an SSD slog for your device and are okay with some risk to your VMs. So it is probably a good way to go, but you do require a slog device. Now I'm not gonna jump into the entire article of this, but this is the ZFS ZIL and slog demystified. It is not a cash. It is the ZFS intent log or slog is the way they call it. And while using a spinning hard disk for slog will yield performance benefits by reducing duplicate rates to the same disk, it is poor use of a hard drive given the small size, but high frequency of the incoming data. And what they're saying is you really wanna use SSDs for the slog or in bring this up to 2019 terms because this is from 2015, not just solid state, let's go all the way to say things like MVME. And of course, when you get into the higher end enterprise equipment, this goes even further. They make special eyes devices that essentially operate more like RAM speed in order to do this intent log. And this is where there's a lot of reading and writing going on to get the right synchronous. And this is very relevant because we're talking about how VMs sync. So how the data stored in a virtual machine is written there. This adds a lot of complexity and because of the way a file system that has been virtualized and then running instead of VM changes, there's a ton of little writes going on. It's not like normal file writes where I'm just copying a file over to a server. I changed one file out of 10,000. The OS is handling it when you use a share, for example, on FreeNAS and ZFS. All right, so let's jump into the hardware setup and how this is actually configured. First, this was the FreeNAS mini XL plus that is on loan from the folks at IAC systems that we're using for demonstration. It is hooked up at 10 gig. This server is going to be this box here. So we'll go over here to hosts for those wondering. This is a Dell PowerEdge R720 XD provided by the folks over at Tech Supply Direct. We have an offer code with them. If you wanna check it out below, they get you 10% off purchases of servers, workstations or anything you wanna buy from their website. They have some excellent deals. We've bought servers from them and this one was provided kind of a sponsorship for the channel. So I really wanna thank them. And if you want 10% off, we have an affiliate link below. They get you 10% off any purchases on their server equipment. All right, let's jump into how this system is configured in more detail. So this FreeNAS mini XL is connected via RJ45 at 10 gig. I have a whole review of this box and I'll leave a review link to it that shows how it works and all the little hardware details of it. It's running in Intel Atom C3758 at 2.2 gigahertz and these are all SSDs that we have for the disinuse. So everything we're gonna be doing is 100% test with SSD drives. They're just some SanDisk drives or 240s or nothing high performance or nothing really high end. They're consumer drives. Like CPNG also connected at 10 gigs. So the storage is connected at 10 gigs. The E0 is connected at 10 gigs. We're actually running everything over just the E0 but it is a 10 gig connection. And I set it up here for demonstration just so people know. No magic here. I'm not that good at editing to create a bunch of lies. You just have to take my word for it. Yes, it talks at full 10 gigs, 9.41 gigs a second. No problems there. So that's all that. For those wondering how I'm splitting the screen because that always comes up first and you can see I've already been doing some testing here. We'll clear all this. This is Tmux and I've just split a screen so I can have each one of the VMs that I log into here and up top so we know what we're running here. This is low on the free NAS. And if we're gonna, this file right here, all we're doing is running Zpool iostat-v for SSD tank. That's the name of the tank on there. Update every one second. So that just so we have some information rolling by so we can see these reads and writes happening. Go back over to here. And we have, and we'll go ahead and both of these are started up. So we'll go ahead and log into them. Like I said, I've already been running a lot of tests. We'll jump into all the details of the results and everything else here. So this one, the Debian lab on NFS Free NAS Mini XL Plus. We'll log into that one here. And then we'll go over here, get the IP address of this one, which is 158, and we'll log into that one here. Oops. Okay, both of these are logged and it's ready. And to make things as clear as possible when I'm doing these demonstrations, this one says Debian Free NAS Mini XL Plus NFS. This one says Debian Free NAS Mini XL Plus iSCSI. And to show you the pools, the way the pools are set up. Here is the SCSI one set up as a ZVAL. So it's set up as block storage through iSCSI. Here is the SSD test right here. This is set up as an NFS. So this one's iSCSI. This is the NSS test. I labeled it SSD test and forgot to add it. I guess I could always edit the comments on there. But you get the idea. This is the SCSI one. This is the NFS one. They're both running on the same disc pool. So we're eliminating all the different variables we're seeing directly because it's the same Free NAS box. It's the same disc pool. We're seeing how this is all set up. Other details that are noteworthy. So we're gonna go here and edit options. And like we had mentioned, we have sync disabled. This is the higher risk, but better performance when you're running an NFS share. Now with the iSCSI here, we're gonna go ahead and, oops, edit the ZVAL. It's standard sync. So nothing special. That's just the default because by default with iSCSI when you create a ZVAL, it's gonna sync inherited based on the data set, the data set by default once a sync. So that's all set to be normal as far as that goes. So that should give us the best performance in NFS and that's where we're going to start. What kind of performance do we get out of this NFS system? So over here, what you're seeing at the top here is the four drives and I have a cash drive in here. I love the cash drive in because a lot of people think you can get a lot more performance when you're dealing with virtual machines out of a caching drive. This is not the same as a ZIL, but the read cache. And that's kind of limited because it's better to put more memory in a system because it will pull read from the memory. This is one of the things that ZFS does really well is push a lot of people think ZFS is memory hog but it's actually pushing all the constantly read data into memory so it does benefit in performance from having a lot of memory. But I went ahead and threw this in here and you can see how much it's being used. Even with these running, you can see these little writes happening here and there and it's just not caching up. But I'll leave it up there so you can see what's going on. First thing we're gonna do and I'll show you of course how this works. Speed test, we're just gonna create a file and delete it. I think that created a two gig file. So what kind of speed do we get out of this? 949 megs a second. So we'll run it again, 975. So you get pretty fast speed out of these four drives. Now this is RAID Z1 if you didn't notice up here. So in case you're wondering. And like I said, in those articles I'm gonna leave links to there's breakdowns of the performance difference in Z1, Z2, et cetera and spreading the writes across more drives. So now we're gonna run the speed test over on the ice because you hear what kind of performance we're gonna get out of this. 731, not bad, we're gonna do it again. You run them a few times and it's pushing the data right in back to 787. So cool. Once again, cache had nothing to do with it. All it's just doing is reading, writing a simple drive. We're gonna get something more complicated here next. And we're gonna run the Pheronix Tiozone test. We're gonna do, we'll say 64 kilobit. All right, so option two, for writing a two gig of data and one. And let's see what the right performance is on this. So you're gonna watch this up here happen. So we were seeing all the drives spin up and all the writes occurring over here and it's running the tests. Let this test complete. All right, this was able to complete. It looks like it's 694. All right, let's run the same test over here. Same options, option two, option two. Right performance. And the reason we're doing right performance is because read performance gives you ridiculously big numbers because of the caching. So this tool doesn't do a good job of testing read performance because it comes up with some really artificial numbers that are unrealistic for the actual speed of the drive. All right, now that completed. So we have 694 over here with NFS and we have 530 here with ISCAD. So you're probably thinking, oh, NFS all the way. Definitely a much faster system, et cetera, et cetera. But there's more to the story. So the first part we're gonna do is this was without integrity. This is like the high performance, but higher risk. And for my lab, I think this is perfectly fine setup. For production, I don't accept that as an option because you know, occasionally things happen. Even with a UPS, even with proper shutdowns, something could happen. You want the most integrity. So let's see what happens when we turn on CFS syncing. So we're gonna go over here and we'll go to this here and we're going to edit options, sync always, save. And you can do these without shutting down the VMs. This is perfectly fine. And let's just run, we don't even need to do much more than run the raw speed test. So we'll speed test here. Have the number pulled up, 776, so run it here. Okay, this is where things slow down. It's going to take a while to even complete the speed test. This is, we went from, if you remember the original speed test before the right here, and we'll actually scroll back while we're waiting. So if we scroll all the way up to the top here, we're getting 975, 949. And you can see all the rights happening, but when you tell it to sync every change as it's occurring, we lose 90% of the speed without a slog device, even though we're running SSDs. So if you're wondering if SSDs still need a slog device, absolutely. So we're down to 806. So almost 90% of our speed is just gone. So it's substantially, and I'm not going to bother running this test. It's going to be the same. It's going to be really, really poor performance. So right here is telling you that you really need the sinks turned on. So now what happens if we go and turn the syncing on and add a slog device. We're going to do both here. So we already got syncing turned on. So now let's go ahead and, once again, you don't need to shut down the VMs for this. Because the devices like slog and cache can be dynamically removed or added to a pool, we can go here, we're going to go add the log device, and this is another SSD we have in here. So we're going to go ahead and fill this over here and extend. Confirm we're extending the pool. And watch what happens over here. This is going to refresh in a second and it'll show up. There we go. Now let's just run the same test. Now we have this slog device. Now a lot of people ask how big it needs to be. This is way bigger than it needs to be, by the way. A really small drive because you're only putting as much data as it can accumulate over a short period of time. They detail that. There's a formula you can calculate, but they can be very, very small drives. And I'll demonstrate that here. Run the same speed test. And the only thing we did was add a slog drive. So here we actually see quite a bit of data being written to the slog drive. Now like I said, these are all SSDs and this is not any faster than these. So we went from 106 to 426. Now let's run this test again to see what type of performance changes there are here. So we'll run it as the same here. So two, two, one. Now the reason I'm running this test again is because it's not gonna be as fast as it was. Once you turn on synchronization, even with this device here, there are still slowdowns. There are still performance issues that happen here. It's just not going to be quite as fast. But you have very high integrity because it's absolutely synchronizing and writing to the ZFS metadata, every single minor change that happens to the system. So it's not that ZFS with NFS is bad, but if you want 100% top-notch integrity, this is the way to get it. But that does come still at the expense of not being quite as fast. So it's almost done running here and we'll see what the results were. Down to about 300 on this particular test. So that is a far cry from, if we scroll up here, the 694 that we had before. So you can see there's quite a bit of a difference, but that's still not the full story. Matter of fact, when we run this a couple of times, it may even improve a little bit as it may expand the VM, but it's gonna be, we'll run one more time while we're waiting, so two, two, one, and see if it's a little bit better the second time, but probably not much. Sometimes you get small variations on there after you add something. I don't know if it's because it's caching or expanding and drive a little bit, but you'll see some small variations on it. But you can see the integrity comes at the expense of speed. But we're gonna dive deeper into this because this is still just one thing happening at a time. The idea of an entire virtualization stack is so many things can happen at a time. And you actually know is we're using a little bit of the cache, like kilobits of the cache drive here. So not much. That's one reason I left that in there. It's not going to, even with the iSCSI over here, it's not going to create a big performance difference. So this is run in and let's see where we end up. It should be just about finished. There we go. Yeah, 292, pretty much the same as before. But here we are over at the iSCSI. And we run the same test again. Two, two, one. And while that's running, I want to point out something else. So all this is fine. And we're running each one of these individually. So let's take a look though at the stats that we're getting from the storage devices. So storages and we'll open up each one separately. There's the SCSI on FreeNAS MiniXL Plus and here's the NFS on there and we'll go to the stats page. So we see the pretty impressive IOPS that we're getting out of there. And if you're not familiar with IOPS, this is an important number when it comes to determining virtual machine performance. It's not just about like raw performance of how fast can we transfer data back and forth. It's, you know, there's a lot of operations going on especially with the sync operations, especially with a lot of VMs, ideally maybe 20, 30 VMs you have running on there, all doing things you need IOPS as in how much kind of performance on there. Then we have the IO wait time, which is how long were we waiting for the IO? And then we have the latency here. So these are all the different factors. That's why these stats are here. And then here on the iSCSI one, we're seeing, we just ran that test here. So we've seen about 450, 493 we ran the test before, but we only have a two millisecond latency and our IO wait time hit 23% versus over here we were hitting in the 31% and four millisecond. But we did see some bursting that was a lot faster. So that's gonna be some of the factors that play into it. Now, why am I bringing all this up? So let's go ahead and we're gonna do this side by side as we're gonna start running the test. And there we go. This lined up to better. So now we have these two lined up. So now we can run this here and let's push it quite a bit further. So we're still getting 520 out of here, but still 290 so it's not as much. But these are of course benchmarks which sometimes don't always get things right. So I wanna show some more real world here. One of the things you're gonna see right away is when you start creating snapshots. Because snapshots, which we love, and this is the wonderful reason for running any of these, so we're gonna hit it to the MFS for a snapshot here, is you wanna be able to snapshot your VMs. This is one of the huge things. Before you make a change like a snapshot, you can revert back to snapshots. It makes my backups easier and things like that. Snapshots come at a performance price. This is one of the reasons when you're doing storage planning and designing, I've never understood people who wanna store everything inside the VM. For example, if you have a file server, don't store it all in a VM because if you wanna create any snapshots, you're trying to keep differentials and we have clients with a few terabytes of data and thousands of little documents for course their enterprise. If you store them all inside the VM, that would be very difficult because now you're snapshotting at this level and big performance penalty. At some point you want the file system to be directly attached to some other storage device for storage. As a talk for another day about storage planning outside of here, we're gonna focus on the VMs. But this differential that has to be created is we have to keep track of the changes between the current status of the operating system that's running in a VM and the difference between here. So I can at any given time, revert this back to the snapshot. So, what does that look like in terms of this? So if we run here, we'll just actually do a speed test. Because the first time we run it, it's gonna be even slower than the next time we run it. So we'll run it once here and you'll start seeing we lost, we went from 400 down to 373. We run it again and we're gonna run a little bit faster because it's now created at least one delta that's about that big between the two, about the same. Now let's run it over here and we're still getting about the same speed. Now, why do we lose speed over here and over here? Well, this is the way you treat those deltas. The deltas are separate individual files that the underlying operating system is handling. So if we go here and cd slash mount send lab. Here are the files we just created and what these files are are the delta differences. So we have the main VM and then we have the delta differences between those and that is what's occurring here. So we wrote a two gig file but it seems to that we have 3.3 gigs here. So it's telling you go create this other file, create this delta and create separation between it and any changes that occur in this so we can always revert back to them, log those changes in there and it's basically xcpng or whatever your hypervisor is talking back to the system on the back ends and create these files and put this transaction data here. So for any time you create transaction data in this it has to then be replicated over here or the differential between it the delta and this obviously comes at the price of performance because when you have a bunch of snapshots and we have to keep track of all those snapshots and then you compound this problem by having a whole lot of VMs running and a whole lot of snapshots and you can see how this can add up really quickly and then there's some management overhead that comes with all that. iSCSI is presented as a block device to the hypervisor. So xcpng handles that by talking directly to it as if it's a hard drive attached to it and then that's why we can't see it even from the command line when you look at block devices inside a free NAS it just goes it's a block device it's a Zval we presented it as if it was a hard drive. So all those transactions occur by just talking to it as if it was a hard drive so it's a different fundamental to the way iSCSI works which means we lose less performance on it. So we're doing this speed test with or without those snapshots we still get reasonable performance. So let's look at the performances now since we have this. So the last one before the snapshot was 292 what are we looking at with the snapshot? Two, two, one, we'll let that run. Actually we'll put it at the top here so pull the stats again. We see all the read writes happening the ZilCache doing its thing and once again it's not just doing its thing for that one VHD file it has two to keep track of. So we're at 291 and let's scroll back again 292 scroll back a little further not too bad we're keeping up pretty decent with it but it's not near as what it was before. So here this same thing we got a snapshot look over here two, two, one. See what kind of performance we get on the iSCSI. Are you completed interestingly because it had one run that apparently was 376 maybe it was some leftover writes happening so one time slow but then rest of time consistent we ended up with 508. So even with these snapshots we didn't kill the performance over here but for the hell of it let's go create another snapshot so there's more things to keep track of so a new snapshot again and we'll do the same over here and let's look at this I want to look at the performance real quick so here's the NFS one here's your IO speed IOPS we still get more speed out of here because of the way the transactions are working so we actually still got better performance out of the iSCSI it was able to perform at 11,000 IOPS versus only 7,000 over here let's go back and now that we created more snapshots let's just do the raw speed test because it's faster and like I said there's always I've commented many times lies damn lies and benchmarks are only so much you can do but they at least give us an idea of the performance difference and even with more snapshots we still don't lose over here because it's all handling it within the system directly now the other thing we're gonna do is go ahead and purge out these snapshots so purge that one purge this one we'll purge out this I will also mention when you do the purging or actually go here CD slash amount and let's see how fast this happens okay it did purge them depending on the amount of IO activity there is a time it takes to coalesce basically and do a cleanup of all those files that got created on there so that's worth mentioning that when you do these sometimes you have to wait a minute for the IO to settle down so it takes time to coalesce so the speed performance gets back to where it should be looks like those things may have coalesced fast enough and you can look up coalescence between them it's not completely coalesced because it's still performing a little bit slower and this one's performing well now the last thing I wanna test here to kind of show this we're gonna do four all options five all options just go for right performance we'll do the same thing over here so four five one so now both of these are just beating up the same pool of drives intensely fighting for resources so let's see how that's being handled on the back end here so we're gonna look at this here and let this ramp up so we're watching all the drives we'll let this run for a second and see what kind of IOps we get like I said they're competing for resources on here and we're gonna go over here inside our free NAS and go to sharing not sharing services start and stop this because if net data is running by the way and you add and remove cash drives sometimes it decides to not show them so doing this there we go we can see that the disk right is being pushed pretty hard the CPU is being pushed pretty hard but not 100% it's not completely used and as a matter of fact I'll comment as well free NAS is very responsive despite all this disk rating going on ZFS system so you can see what's going on here there's a little gap where I just turned it on and off so make sure I had all the drives in here so we can see the demand prefetched metadata read and write performance going on right here not much reading but a whole lot of writing efficiency and this is data demand efficiency we're gonna see this red right here this is where it's just you know actually writing and exhausting caches and getting information where it needs to be so go back up here the system overview and let's look at these now so competing we're back up to better performance over here on the iSCSI side so we're seeing 496 peak with 11,000 on the IOPS versus 275 peak and 6,000 on the IOPS and our IO wait time two milliseconds of latency here four milliseconds of latency here IO wait is around 22% IO wait just 20 so really close I mean I'm seeing some peak here look we got 29% on there so pretty reasonable but you're still seeing that iSCSI edging it out now I'll come back to the article one more time as I do wanna mention something here just to reiterate and as I said at the beginning iSCSI by default does not implement sync rights as such it often appears users much faster therefore much better choice in NFS however your VM data is being written async so the VM data itself is being written async but the integrity of ZFS behind here is in sync so potentially because and this actually is true of really any time you're dealing with block level device being presented to another operating system from another operating system which is what iSCSI is doing is presenting the ZVAL as a hard drive to whatever you're going to save it on there is an inherent risk because you're taking and changing blocks on the hard drive just like if you unplugged a hard drive mid session there is the potential for corruption in there and in one risk that you may face is if that corruption corrupts the entire ZVAL all the VMs within that ZVAL well they could have a problem and this is one of the reasons backup is so critical why having a UPS and proper shutdown management so you don't have sudden power loss is critical so there is that potential for it yes there's some checking that goes on but think about this just in perspective of if you unplug a SATA cable to a hard drive that's doing an internal write process that's scattering on the files or let's say something like a defrag or coalescence that may occur with VMs these are things that are happening all the time especially in a busy VM environment so there is an inherent risk of data corruption really with any system your radar going bad and writing things bad so this is just something to think about reasons you should have snapshots for any of these and it's one of the things that Ice Guzzi is really good on performance but because you're not writing solid individual block files it's all one block presented to the operating system there is always the chance for corruption in a catastrophic failure versus well any VHD files that are saved to the hard drive in a case of using NFS those VHD files are all individual files so it may not corrupt all the VMs it may corrupt the one that was doing the writing but other ones and especially any of them that are turned off there's no risk of corruption and those because well they're not doing anything they're off so there's not active writing going on so this throwing the options out there for a lab environment I think NFS and this is actually what we use and why so much for our NFS our lab environment is NFS with sync turned off I find it to be an acceptable risk and I get the absolute best performance out of it but as always backup backup backup I back up everything anytime I think there's going to be a significant change even to our lab environment we have this dedicated just to do all these different delta backups for lab that way they can be backed up very very fast and anytime I make any large updates or changes to them that way in case there was ever some catastrophic failure no big deal not to mention your lab machines I use for YouTube demos or testing a theory I have or something that a client may want set up they're not mission critical and they can be reloaded I back them up more just for the convenience of not having to reload them as I do things but like I said these are some of the options out there those articles will let you dive deeper into ZFS and for those you really wanted to dive into performance I do I've mentioned this article many times this is such a good article to all the different write performances and read performances of all the different rate options in ZFS and what they mean and it's a good dive into it if you want to really get a grasp on how all that works all right and thanks and thank you for making it to the end of the video if you like this video please give it a thumbs up if you'd like to see more content from the channel hit the subscribe button and hit the bell icon if you like YouTube to notify you when new videos come out if you'd like to hire us head over to laurancesystems.com fill out our contact page and let us know what we can help you with and what projects you'd like us to work together on if you want to carry on the discussion head over to forums.laurancesystems.com where we can carry on the discussion about this video other videos or other tech topics in general even suggestions for new videos they're accepted right there on our forums which are free also if you'd like to help the channel in other ways head over to our affiliate page we have a lot of great tech offers for you and once again thanks for watching and see you next time