 Welcome to Windows Server Engineering Summit. Today I'm talking about disaster protection with SR, storage replica in Windows Server 2025. My name is Ned Pyle. I'm a principal program manager in Windows Server. You may know me from SMB, DFSR, Active Directory, file services. But I'm also a designer and creator of storage replica, which we first released in Windows Server 2016. We have a lot of big news in that space that I'm going to talk about today. I've only got 15 minutes, I've got to be fast. So let's talk enhanced performance, compression, and then I will show you lots of practical demos. Let's go. A storage replica helps organizations survive disasters. It helps you keep your job. SR is designed for you to be employed. It protects companies, organizations, businesses from gigantic forces outside of their control. And those forces can be earthquakes, hurricanes, wildfires. And they can happen at a city level, they can happen at a regional level, they can happen at a town level. And depending on the size of the organization, that's a big enough disaster that could affect you. Just even the last year, this is a national weather service data. These are the billion-dollar natural disasters that happened just in the United States in 2023. So every one of these things, whether it's a wildfire or a flood or a tornado or giant snowstorm or whatever, each one of these represents at least a billion dollars worth of damage. And SR is here to protect you from that, to give you continuity, to make sure that your place can keep running. So what do we got? In 2025, we have something called the Enhanced Log. Storage replica writes data blocks above the partition, IO blocks, into these logs and chips them between machines and using to reconstruct the data blocks on another machine. And so the performance of this log is key to storage replicas performance. It is the engine. It is the thing that does the replication. And this new Enhanced Log, we sometimes call the raw log, reduces latency, increases throughput, and is based on a different type of technology. So it requires a new partnership. It is in Windows Server 2025 only, and but you can still use 25 servers with SR and in its older mode called CLFS logs. On 22, 2019, 2016, anywhere where SR lives. And then there's compression. So storage replica uses SMB, you know me from SMB, as its fabric, as its transport. And because SMB does compression, we plumbed SR to do compression. And that can give you really significant network savings. It can save you a lot on inefficient IOs that write a lot of effectively empty blocks of stuff into the data. And so on a crappier network, you can see real significant savings on a highly congested network. You can see real savings. And this is now in server 2025 and all of its additions. It was actually shipped in 22, but only an Azure edition. So you may not have been aware that this functionality was there. Now it's available for everybody for all things. And it's primarily for asynchronous replication. Obviously SR does synchronous and asynchronous. All right. So this new log and compression and asynchronous gives me I think a very viable alternative to using DF-SR. Everybody's seen DF-SR, it came out 2003 R2. It's what you use to keep your sysfalls in sync automatically, but its custom replication is a very complex large system, a file replication that's been around for almost 20 years. And it's got some real limitations, especially in age as being a disaster recovery, disaster prevention option. It only works on unlocked files. So locked files in use files don't replicate. It's extremely latent. Minutes, hours, days, latent on changes. It has at this point, pretty small maximum size limitations just because it was designed for a world that no longer exists. It's extremely slow in initial sync. It's multi-writer, which is advantageous in some ways, but can cause its own set of disasters with data overrides from multiple users accessing multiple copies of the data at various points in between replication. It has a very complex and rather finicky staging system that requires a lot of tuning and careful math in order to make sure your replication works optimally. And a lot of space reservation in order to make sure that it will replicate at all as your file sets get larger. And because it's so whole, again, it's from 2003 R2. It never knew things like REFS and newer file systems. It was completely incompatible with them. And really most of its really significant development ended in 2008 R2. Some changes in 2012, some smaller changes in 2016, but for the most part, DFSR has just been a maintained product now that's as good as it's ever going to get. So when could I use storage replica instead of DFSR? I need zero data loss, minimal data loss. I can't be dealing with locked files. I can't be worried about replication being out of sync for very, very, very long periods of time. I've got lower latency in higher bandwidth networks, although compression will help here. Ideally, if you're going to push this much data, you're going to want to have better networks. And DFSR was designed to work on modem networks, extremely narrow networks. And I only need two copies of the data. As DFSR supported thousands and thousands of copies of the same file, SR just does a source server and a destination server. And that's it. And finally, I don't care about this multi-user case. I don't want users writing to 14 different copies of the file and then a complex algorithm deciding who won. Okay, so let's talk about tackling some of DFSR's disaster prevention failings with storage replica. So the initial sync. Here is DFSR, the old DFS management snap-in. Familiar to anybody who grew up on Windows Server 2003. I'm going to go ahead and add a couple of servers here. These are decent servers. They've got all flash. They've got 10 gigabit networking between them on TCP. I'm trying to make this a fairly reasonable case. I'm going to set up replication. I'm going to set up primary. In this case, it's going to be 421. Keep that in mind. And I'm going to set up what folder I'm going to replicate. In this case, it's on the eDrive, RF1 folder. And then I'm going to add the other server, which is 423 and turn on replication. Give it a place to replicate to and let it rip. And that's sort of an optimistic way of putting things because this is, of course, DFSR, there's not a whole lot of ripping going. So we'll go look at the event logs. And I find my, I'm looking for my 412 event. That's my, I've started event. There it is, okay. So here's where I've initialized the replicated folder. I've got a primary member and I'm waiting for all that data to get pulled by the secondary. So I'm calling the secondary and I'm looking for a 4102 event that says, I've heard about this and I'm starting. So we're going to catch this and just watch and see how long it takes. We can see how big this is. I'm replicating about 400 gig of data, about 10,000 files, a mixture of all kinds of files, some big, some small. And so we've been waiting for a while. We're doing 72 gig of wait for, let's wait longer. All right, we waited a lot longer. We waited three hours and it's finally done doing 400 gig. Over 10 gigabit, that's not great. And what's worse is I sort of cheated here on DFSR's behalf. I tuned staging, I made a bunch of little, fine configuration changes as a real, I own DFSR, I'm supposed to be an expert on this. So I made DFSR as fast as I could really make it. Honestly, if this had been an out-of-the-box setup and I hadn't touched any of that stuff, this initial sync, especially with these types of files I was replicating, some of them very large, multi-gig files, it might have taken 24 hours, it might have taken 48 hours to finally reconcile getting its way, forcing it through the staging and the high water marks and the flushing operations and all this shenanigans that DFSR does. Anyway, let's get on to the good stuff. Here's storage replica. Now, why do I call it the raw log? Check out this volume I just created. It is, in fact, raw. And we're sort of restoring this log. It's actually based on storage spaces. This technology is where this log is actually kept. And so now I'm going to put in my server. That's 421 again. I'm just picking a different drive. So instead of using the eDrive, which is where DFSR was, I'm just picking another exactly the same hardware, GDrive, another flash drive. And I'll add my other server and pick another drive. And I'm going to set this for asynchronous just because, you know, trying to match up to my use case where it's a file server that I want to be working on, you know, worse networks. I don't need synchronous replication. And I'm probably going to use compression. So it started. I'm watching. Notice I'm using Windows Admin Center for all of this, not gross old Snap-ins from 1999. And time has gone by. It's now continuously replicating. And if I go look in my event viewer here, which is in whack and I don't need to go wandering off to another tool, I can see here that my replication started at 8.09 and it finished at 8.17, eight and a half minutes is how long it took. That's all. So not three hours, not 24 hours, not 48 hours, eight and a half minutes, not too shabby. And if I was being like, cool, I would have used RDMA and SMB Direct and 40 gigabit network cards here. It would have been even faster, but I'm giving DFSR a fighting chance because I don't want to be super rude. All right, let's talk about just protecting a file. So I'm going to take on my DFSR server. I'm going to copy this big file. It's about 14 gigabytes and just drop it into my replicated folder on the source machine. And here it's copying. And then it's going to be replicated over to the other server using DFSR. So it's taking me, I don't know, 20 seconds to copy the file here onto the machine. And, but it's not really protected yet, right? It's got to actually be copied. So let's take a look at this thing. Yeah, last access at about a minute. It's a helpful little timestamp. So let's go over and look on the other server and figure out how long it took for this thing to show up. And I'm going to take a look. It's still not here. Should start with an S. Let's just wait a while. We ended up having to wait about, well, about 10 minutes. When I open this file, you can see my last access time. It took quite a while for it to show up, actually. Let's try this with storage replica. Now, this is not files anymore, so I can't go look at a file, right? It's blocks. So I'm going to go and use perf counters and just use them to let me eyeball the transfer of this particular file, the same file, I'm going to do it again. So I'm going to set my log writes per second. I'm going to set my bytes per second on sent. And then I'm just going to do the same thing, just copy a file into the storage replicated G drive. Let's just copy this thing over here. Here it goes. Now, I'm copying the file onto the disk, but really what am I doing, right? I'm replicating the file onto the other server simultaneously. So this copy, when it's done here, being on the disk, it's done being on the other one as well. There's no 10 minutes. There's no, like, let me unlock the file. There's no, well, I'm waiting for the staging folder to clear out, none of that nonsense. Like, the IO here is the IO on both machines. And you can see it's going. I'm getting a pretty good throughput there. And when we're all said and done, I'm waiting for this to finally finish flushing out. Here we go. 35 seconds, not bad. Not 10 minutes, not two hours, not a week, 35 seconds. So here's the old log and the new log, CLFS log versus raw log, exact same operation. How much better is it than, say, Windows Server 2022? Copy my file. Everything's being happening at the same time. You can see that I'm actually getting more consistent throughput and higher throughput on my raw one. By now, if you haven't guessed, it's the number, it's the top one is where raw is going. And so it's cranking and cranking, consistently cranking, no dropouts, moving along consistently, doing about 500 megabytes a second. That's looking good. This, by the way, is synchronous replication this time. And what it's done here, flush, 32 seconds. The other one's gonna keep going. This is still being protected pretty fast, way better than the FSR. But it's taking much longer to be done because this is the old log from 2022 server. And there it is, 48 seconds. So a vast improvement over the old storage replica, not just over the FSR. So better performance, better network usage, excellent disaster protection. It's in all editions of 25. It's included, there's no extra price, there's no special subscriptions or anything. And I have written an entire article on being able to go and do a migration you can use from the FSR, storage replica. It's on the FileCab blog. And I wanna thank you for your time. Please enjoy the rest of Windows Server Engineering Summit.