 Tom here from Lawrence Systems and we're going to talk about TrueNAS and high availability storage. If you want to learn more about me or my company, head over to LawrenceSystems.com. If you'd like to hire a short project, there's a hires button up at the top. If you're looking for deals and discounts on products and service we talk about on this channel, there are some affiliate links down below that help the channel out. And like I said, you get some deals and discounts. So TrueNAS specifically, we're going to be talking about the TrueNAS M50HA. Now they sell HA in other forms as well, the one I have in for demo that was thankfully lent to me by the TrueNAS folks. The IAC Systems folks is this unit. Someone asked me when I have to return it and I hope never, but there is an end date on how long I get to play with this. And we've been doing some testing with it and I've rebuilt the arrays. And I'm really impressed with it. But that aside, I've already done videos, you know, excited about the hardware. We're going to talk specifically about how the software handles failover because that's the reason you buy one of these HA systems. If you have something that's absolutely mission critical, and you want the storage systems not to fail or mitigate any risk, and that's really what you're doing. Okay, how do we mitigate risk of, you know, hard drives go ahead? We put a RAID array in. That's easy enough. What about the motherboard that controls this going in? How do you mitigate that? Well, in the case of the TrueNAS M50 series and any of their HA ones like this, you put in two motherboards. There are two separate units in this. Like I said, I have a separate video on all the hardware details. But what happens? How does it actually work for failed failover and availability? All right, TrueNAS, and we're going to do a demo of it, but at least I'm going to start with an explainer here. TrueNAS high ability explained, they use an active passive method. So those two controllers, one on top of the other, the storage controller one storage controller two are both talking to physically every drive at the same time, they have total connection to them. So should the first controller fail, the second storage controller has to register with the storage matrix before it can form any IO. And additionally, the second storage controller might not be powered on a waiting. So it has to boot up from a cold state. This is one of the problems of having like a cold controller. This is why they use it in active standby. In the active standby array, every disk is dual ported, allowing the second controller to be connected directly to each disk at all times. The second controller waits for the authority to handle IO operations. Finally, any cash as the first controller can be synchronized, the second controller ensuring it does not have to be repopulated after a fail or event. The end result is that a failover operation can happen in seconds rather than minutes, significantly reducing the chance of a client timeout. Matter of fact, you, when you're using iSCSI, absolutely, it's seamless. So that's what we're going to demo on this. Now they also explain why they didn't do active active, because I've had people ask, well, does it do load balancing active active because the other controllers just kind of chill and do nothing? And the problem with that is if you do load balance between the two of them, and you have a load on the system that would exceed what any one system can handle, well, doesn't really fail over gracefully at that point, because now the IOPS goes down because you can't perform as much and your performance goes down and that may cause an application to crash or cause other issues. So in that circumstance, you really want a system that is able to be active standby. So whatever load the system can handle, the other controller, even though it's not doing anything right now in the case of a failure, it absolutely can handle. That's an important aspect of it. So I'll leave a link to this and this is, you know, a little bit more of a write up on there. Now, like any lawyer who asks a question in court, they always know the answer before they actually ask the question. And yes, I know the failover is going to work because I'm showing you the demo here. This is what it looks like when you failover. We're going to do it in real time. But this was running on and I've named the system here, Loch Nass one and Loch Nass two. So, you know, locked this monster, they're big. All right, if I have to explain the joke, it's not funny. So this one right here, Loch Nass one and Loch Nass two was in reverse. I had already initiated failover before I did the video to show it working. And this is kind of like the results of that. I was running a bunch of tests and we see it ramping up of all the data being pulled across because this one was the primary and when we switched, there's a stop point of right here at 210 pm and 210 pm where it switches to all the IO load going over to Loch Nass one, we're going to do the same thing in reverse. So the HA works using the cart method. So the cart means I'm always going to 192 168 to 3.250. But these each have separate IP addresses and the carp address is the shared address between them. So when you're connecting everything that connects to the carpet dress, this is going to be the failover address. So when I'm setting things up and we're going to look over here like XCPNG, I'm got these set up as true Nass ice guzzly and the true Nass ice guzzly is connected to 192 168 3.250. And it's very important that you do this that you don't connect it to one of the other primary assigned IPs on the storage where you always attach it to the carp IP. And what that does is that is the IP that the two systems used by which everything should connect that way no matter which one fails over, you always have a solid IP. That's just a little important detail about how these work. And I've done videos on PF sense using carp. It's the same concept exactly being used here. This is the ice guzzly extent mapped right here through true Nass. This one's active called Loch Ness one. When you look here if you mouse over this what port 13 is Loch Ness two port 14 is Loch Ness one. They're both connected at 10 gig. And I have a couple machines running on this. So I have this Debian and true Nass and then I got this windows on true Nass and for windows, why not make it fun? What's the most dangerous thing you can think to do losing an array is rebooting or unplugging a windows machine while it's doing an update. It just windows doesn't survive well if it's doing an update and you restart it or lose the storage controller. So we're going to kick this off. Then I'll kick this off too. Why not run some tests on here? So we'll just make it a little bit more taxing nine iterations hit all there we go. We are just getting to massively keep this drive really busy doing high numbers of reads writes while it's doing an update. So IO activity is absolutely, you know, it's going. It's a lot happening right here and top that off. Why not go over here? And well, let's SSH into this one. So let's say p address is going to be 3.142. Sage root at. All right. So let's make this do something. I think I got the Veronica's test suite on here. There we go. It's gonna run a little slow because we're loading it up on IOPS on the other side. We're hammering this thing right now. So I'll let that go. We'll just say four, five, three, there we go. Nope, don't care about saving results. Now it's going to go hammer down and create a bunch of disk IO back over here to storage. Look at TrueNAS ice guzzy. Look at the stats. And here we go. We're about 500 megs a second right now in transfers based on all the little readers writes on there and 50,000 plus IOPS. It's still going up 54,000 IOPS. So you could say the storage controller is quite busy right now. Definitely load it up. So let's go back over here to TrueNAS. Yep. And we definitely see some processor load and it's not killing it, but it's it's doing some stuff now. We see 18 gigs committed to cache. We'll keep seeing this get bigger and bigger because it's going to cache some of those rebright operations that are going on. So this thing's just hammering away doing exactly what we wanted to do. So it's going and then if we go over here to Unify, we'll go ahead and refresh this. It'll take a second to catch up with the stats and put Unify will start showing all these stats and going, yep, we'll see all these transfers going between these devices. So fresh again. All right, there we go. So we can see the the i7 system holding a lot of data. We can see Loch Ness 2 sitting here doing nothing because all the data is going to Loch Ness 1. Loch Ness 1 is the primary right now, as you can see. So this is the primary. And what port is that plugged into? See that is plugged into port 14 is Loch Ness 1 and port 13 is here. So what we're going to do in Windows, it should still be running its updates here. Oh, look, we're getting some hard drive numbers pending restart. So we won't know I guess we could we're going to go ahead and let this run that run tell Windows to restart and well, we'll actually we'll just keep letting this continue to run and then we'll restart Windows as soon as I go pull the plug. So what we're going to do is I'm going to run in the other room just real quick and yank the plug out because that seems like the most devastating when I could just tell it to disconnect right here, which would be unexpected. But you know, why not just physically pull the plug out. So we're going to pull out plug 14. You'll see it go dark here when it refreshes. And then we're going to have restart Windows as soon as I pull it out. As a matter of fact, I'll try to do it all as one operation. This is running. We're going to hit restart, pull it out. And let's just see if we lose any data. All right, cable unplugged. And Windows is still updating in just for proof, right here Trunas Iskazy. This is where the data lives for that particular system. We're still running Windows update. We'll let it keep doing its thing here. We'll switch over to the other system. Oh, it's still running over here. We didn't miss a beat on this one. This is the lab machine still doing its thing. Actually, we should get an error over here though. So let's refresh the page and it should checking HA status waiting for one of the controllers. Now, you do lose for a moment while it does the checking. And we switched over to the other steam. Your session ID was logged into the first one. So now we just have to log in again over here. And we should get a, yeah, alert. Yes, warning. Loch Ness 2 has had a fail over event. And we see Loch Ness 2 is now the system in charge. But once again, everything's working. HA is disabled because it can't even find the other system. So it'll give you an HA disabled error. And once we plug it back in, it'll go ahead and come back. So we'll let it keep doing its thing and we'll let Windows keep doing its thing here. Back over here, Windows. Oh, yeah, Windows is not rebooting because of any other reason because it's finishing loading updates. You can see it didn't actually shut down or die back over here. This one, she's still up and running, still responding. Obviously, if the hard drive was broke, it wouldn't respond. Up time, didn't reboot. So been up for 14 minutes. We started it when we started this video here. Windows is still running its update. So, oh, and Windows appears big backup and running. And in the background, no, nothing went wrong here. We're still just running tests in the background running heavy IO loads. System still loaded up just like it was over here. Windows restarting again, because, you know, updates require multiple restarts for reasons that I don't completely understand compared to Linux. But I'm not here to harp on Windows. And everything's up and running. This is as simple as that for the failover. Now, let's see, we'll give the unified takes a little bit more time to refresh these stats. So you're going to see a gap and then it's going to pull the data and we'll have another gap here. But what you're going to see is it go flat on one and then locked as to because that's what's primary. Now, this one here, all those IO ops are still happening in the background. So they will just move over to the other system. And I'm actually going to stop real quick and plug back in the other system, just so it's plugged in, it'll catch up and I'll show you how they resync to each other and ready for the failover event to happen again. Windows still booting. Yeah, Windows is still doing its magic. All right, and the systems all back up. All they did was plug it back in because it wasn't actually a hardware hardware failure, but it was, you know, this simulated because we lost connectivity on there, it switched to the other one. Now, ideally, if you're setting this up in HA, you're going to have redundant switches, one plugged into one switch, everything's going to be paired. The other system plugged into the other switch. That way there's if a switch failure or a cable failure or whichever failure you're dealing with on you mitigate the risk dramatically by able to do that. This is pretty basic. It goes all, I have it only tied into one 10 gig switch, but you get the idea for high availability that it works seamlessly, that this is still running in the background and we didn't lose anything by switching between there. We were running Windows update, we disconnected the storage controller that Windows was running on and still didn't lose any data and Windows didn't crash, which is risky just in general doing updates. All right, I'll stop picking on Windows. The important thing is with these high abilities with this active standby, it's very robustly reliable. But someone will always ask me, what about, can't I just store things with, let's say, SEF or Gluster or many of the other distributed file systems out there that allow you to take physical servers, so to speak, and create them as a rate array? Yes, that does work. The problem you run into, and it's not a problem that's not solvable, it's a problem that gets expensive to solve. Because the TrueNAS system works with dual controllers, and the switching fabric is right down to the level of talking to both hard drives and then synchronizing at that speed between the controllers, you have this instant failover and you can have excellent speed. When you talk about tying something together with a distributed file system, Gluster or SEF, just as examples, you talk about rating essentially like a raid between physically separate servers, but you can never synchronize the file system for full AHA any faster than the network interconnect between them. So the problem becomes building an interconnect fast enough to handle the load between servers. So if one server were to fail in your cluster of servers, then you have to have that data immediately available. So those are other ways of handling this. TrueNAS is really good at handling the scenario that we had set up right here. You could build it into something else and use one of these other things on top of it. But for a True, you want to put it in there and you have your virtualization stack connected with the ice fuzzy and have redundancy in the entire stack so you can instantly fail over from a controller or a switch failure where they're dual plugged in and everything else. This will handle that flawlessly at very, very high speeds. So we get excellent read write performance. We have excellent reliability right here. And so this is the scenario that you're building out with this. Like I said, it's not the only solution. But when you think about the other solutions, those are some of the challenges you come into of building multiple fast servers and may well synchronize the file system between there because the TrueNAS system is actually attached to each drive at the same time. There isn't any delay and failover and all the rights are happening always to the same hard drives. So the two systems are talking to the same drives. Hence the reason this works so well and so fast. And just to refresh the Unify here you see we're still hammering now on Loch Ness 2 because it's primary and Loch Ness 1 is now the backup one. So there's really not any data. There's just some really basic couple kilobytes because it's talking to that same 250 address in confirming that it's still there. If it loses connection to it, like when you just yank the plug out, it immediately initiates the failover and becomes master again. And we can, you know, flip these back and forth. So Loch Ness 2 is primary. Loch Ness 1 is secondary. And like I said, the cycle continues by unplugging, plugging back forth. So hopefully this clears up a little bit of how the HA system works on TrueNAS and how it survives a failure of a, well, you can fail over the entire motherboard itself. You can fail over just by unplugging the network. But either way, when it loses communication, it immediately with a nice connection without dropping anything can switch over and work perfectly fine in terms of not losing any data and maintaining so the users don't know what happened. And that's the idea of any of this system is you should be able to survive different pieces of failure, whether it's a drive or an entire part of your system right here. You should be able to survive failure and the users keep on working because when you're doing your job right, they don't know you're doing your job. It's sometimes thankless, but it's also really a smile on my face to say, hey, it failed. We got the alert that it failed, but nobody had to stop working. And that's the important part. All right, thanks. And thank you for making it to the end of the video. If you like this video, please give it a thumbs up. If you'd like to see more content from the channel, hit the subscribe button and hit the bell icon. If you'd like YouTube to notify you when new videos come out. If you'd like to hire us, head over to laurancesystems.com, fill out our contact page and let us know what we can help you with and what projects you'd like us to work together on. If you want to carry on the discussion, head over to forums.laurancesystems.com where we can carry on the discussion about this video, other videos or other tech topics in general, even suggestions for new videos, they're accepted right there on our forums, which are free. Also, if you'd like to help the channel in other ways, head over to our affiliate page. We have a lot of great tech offers for you. And once again, thanks for watching and see you next time.