 Everyone, welcome to Windows Server Summit 2024. Today we'll be talking about supercharging your data centers with Hyper-V and GPUs. My name is Ifyap Wachit and I'm a product manager on our CoreOS team. And I'll be presenting with Nicole Warren, who is also a product manager on our CoreOS team. Today, we'll be covering GPU DDA in a failover cluster, GPU partitioning or GPUP, GPUP live migration, and GPUP availability. So let's talk about GPU DDA in failover clusters. Failover clusters are a group of independent nodes that work together to increase availability of cluster services. If one of the nodes in these clusters fails, another node can step in and run these workloads on that failed node. GPU DDA, or discrete device assignment, allows us to assign one or more entire physical GPU to a single virtual machine. And this empowers our customers to run their high compute workloads on VMs, which is essential as AI becomes more increasingly used. We are introducing GPU DDA to our failover clusters so that now in the event of a server failure, workloads can be automatically restarted on another node within the failover cluster quickly and allow our customers to complete high compute tasks in the face of hardware failures. We're excited to include GPU DDA for failover clusters in Windows Server Data Center 2025. Now I'll hand it over to Nicole. Thanks, Ifyap. This will dive into GPU partitioning, including showcasing a few demos of these up-and-coming features. GPU P, or GPU partitioning, is a new feature coming to Windows Server and was previously added to HCI with 22H2OS release. GPU P allows users to share a single physical GPU device with multiple virtual machines. With GPU P, each VM gets a dedicated portion of the GPU capacity instead of an entire GPU. Since keeping workloads secure is a top priority, the GPU P feature uses SR-IOV interface to create a hardware-backed security boundary layer for each VM. This ensures each VM only has access to the GPU resources dedicated to that specific VM, preventing unauthorized access from other VMs. Let's now look at a few ways GPU P brings value and benefits for our customers. Existing technology, existing virtualization technology is deficient for the modern era. GPUs are very expensive and supply more horsepower than a single workload typically requires. GPU workloads were previously limited to niche applications, but now AI backend services and virtual desktop for data scientists are more common in private data center workloads. Remote FX, VGPU, had made it possible for multiple VMs to share physical GPU, but was removed as an option for users with Windows Server 2019 and disabled in 2020 in all applicable Windows platforms. This was due to remote FX VGPU being susceptible to security vulnerabilities, vulnerabilities that were architectural in nature. As a result, customers were once again left needing to give an entire GPU to the VM. GPU P now reestablishes this capacity to share a physical GPU with multiple VMs and only provide the capacity each VM needs for its particular workload, providing cost optimized and efficient configurations. This demo showcases GPU, DDA and GPU P assignment using Windows Admin Center. Hello everyone. Today I'm going to quickly demo the new GPU tool for the Windows Server 2025 data center. So as you can see here, we are currently managing this Windows Server 2025 data center in WAC and as you can see, if you scroll down, there will be a new GPU tool in WAC that will allow you to manage the GPU on this server similar to that of an Azure.ACI server. So from here, we can see that from this tab, there is an Nvidia A2 GPU that is currently set up for DDA assignment. So let's go ahead and create some DDA pool. Go ahead and click on creating a pool, set the pool name, include the Nvidia A2 in this pool. You can choose to assign without whole sign F and then hit save. The creation of a pool is successful. Now we can go ahead and assign virtual machine to this GPU resource pool. Go ahead and select a VM. Here you can also set a specific memory map IO space, low and high. Going to keep it as is. And as you can see, the VM is now successfully assigned to this Nvidia A2 GPU pool. You can go ahead and also unassign this virtual machine from this GPU pool, refresh the table. And as you can see, the GPU is now free of and it's no longer assigned to that virtual machine. You can also demo the deletion of this GPU pool. And as you can see, you can successfully delete the GPU DDA pool. Here we have another server, also running a Windows Server 2025 data center. Let's go ahead and score down and click on the new GPU tool. And we can see that the server currently have an Nvidia A2 that is set up for GPU P or GPU partition. So let's head over to the GPU partition tab. And we can see that the video A2 is set up with eight partition count. So let's go ahead and partition it to be four partition counts. Go ahead and click configure partition. And you can see that we successfully configured the GPU to become four partition. Let's give it a second here and the partition count will update to four. Next let's assign a partition count to a virtual machine. So we're going to go ahead, choose the server, choose a virtual machine. I'm going to go ahead and choose this virtual machine and hit on assign partition. And as the operation successfully go through, I'm going to give it a second for the data tables update. And now this virtual machine is assigned to this partition. Let's go ahead and assign this partition from this virtual machine. I'm going to hit the unassigned partition button. Go ahead and choose the virtual machine that's currently have the partition, click on unassigned. And as the operation successfully go through, give it a second in the data table should update with virtual machines removed from this partition as well as this video GPU. GPU P is easy to use. It only requires installation of a supported GPU P driver. The list of currently supported drivers will be shared later in the presentation. GPU P can be enabled using GPU settings, WAC, and PowerShell. These environments can be used to configure GPU partition count, assign and unassign GPU partitions. There's also published public documentation from Microsoft on setting up GPU partitions on HCI. We will also be publishing additional documentation for GPU P on Windows Server and GPU P line migration as part of the upcoming 24H2 in Windows Server 2025 OS releases. Next we'll be showcasing a demo that shows how to set up GPU partitions using WAC. This demo will walk through assigning and unassigning a GPU partition as well as partition counts using WAC. Near the end of the demo it provides a visual comparison of running a scenario, running a workload with the traditional GPU compared to running the same workload with GPU P enabled. I'm quickly creating a host pool in Azure Portal. In this tab I'm giving all the host pool relevant information. In the second tab I'm creating the virtual machines on Azure Staget CI by selecting a custom location by providing VM configuration details, network, domain join credentials and local admin credentials. I am creating three virtual machines for this demo scenario. I'll add a workspace and then I will kick off the deployment which will take a couple of minutes. So now the host pool is created and you can see that there are three virtual machines in this host pool which will act as session host. I will attach GPU partition resources to two of these virtual machines and I'll leave one of the virtual machines as it is. So coming back to WAC I've already assigned AVD HCI VM 2 and 3 to two GPU partitions and AVD HCI VM 1 doesn't have any GPU partitions. So now I'll show you what the end experience is based on if a user who logs in gets a VM with GPU resources and without. So I'm using Windows app to log in as two AVD users on two separate session hosts. So on the left is VM 1 without any GPU resources and I'm logged in as AVD user 1. On the right is VM 2 with a GPU partition assigned and I'm logged in as AVD user 2. On the VM 1 on the left I can see that frames per second are around 5 with 500 fish whereas in comparison in VM 2 which has a GPU partition frames per second are around 31, 32 with 30,000 fish. So you can see how a user experience can vary based on if the user gets access to a VM with GPU resources versus without. So now let's take a look at how multi session capabilities work with GPU partitioning. So in this example I have two users user 2 and user 3 who are both accessing their AVD desktop session on the same virtual machine and this virtual machine has one GPU partition assigned to it. So as you can see in the aquarium demo frames per second are consistent across both the user sessions so you can see that how GPU partitioning plus multi session capabilities bring together an improved and a consistent experience for our end users. As part of 24H2 and Windows Server 2025 OS releases will also be adding line migration for GPUP. Line migration allows for customers to provide maintenance and updates to their VM fleets with minimal workload impact. It only requires a brief workload pause when the final memory transfer pass is moved from the source to the target VM. Line migration is the key to keeping data center fleets up and running allowing customers to keep their workloads moving and not stopping for unexpected or planned maintenance events. This next demo will showcase a GPUP line migration being performed during a gaming session to demonstrate line migration on a high intensity workload. For this demo we will use two homogenous compute nodes. Each of them have 512 GB of RAM, 232 core CPUs and two NVDIA2 GPUs. The GPUs on both the nodes have been partitioned into two layouts. One into two halves that contain approximately 8 GB of RAM each and another into 16 partitions that contain approximately 1 GB of RAM each. The node on the left contains the test VM that we will be using in this demo. These two nodes form a hyperconverged cluster. The clustered test VM's virtual storage is placed on a cluster shared volume. Since it is available at all nodes, storage migration will not be performed while live migrating this VM. Once the VM is started, the 8 GB GPU partition is attached to the VM as configured. Once the VM receives its IP address, we will RDP into the VM. We configured the VM with 64 virtual processors and 128 GB of RAM. The 8 GB GPU partition is directly assigned to the VM which can be seen here in the task manager. For this demo, we will be using the graphics benchmark utility of Cyberpunk 2077 game to add a high load to the GPU partition assigned to the VM. We are configuring the least possible values for all the graphic settings to achieve the highest FPS for this VM configuration. We will initially run the benchmark utility without live migration to gather control data. The VM's guest runtime percentage can be found at the Puffmon while overall utilization of the nodes CPU is displayed using the task manager. Throughout the test, we could see that both the root partitions are relatively idle while almost all of the CPU utilization is coming from the test VM. The initial bump in the guest runtime can be correlated to the stutter while loading the benchmark scene. The live FPS counter of the scene can be found at the top left corner of the remote desktop connection to the VM. We are able to achieve approximately 50 FPS while reaching up to 76 FPS. Now let's rerun this benchmark scene with the same settings but also perform live migration of the VM. I will be using the cluster UI to manually migrate the VM to the node on the right. The cluster will live migrate this VM if the node is rebooted or if its nodes aren't right. Additional CPU usage can be seen due to kernel activity which can be correlated to TCP send and receive operations of the VM data during live migration. The compression and decompression of the VM data also adds to the CPU usage. We can see that the FPS performance is pretty close to the control data. The stutter just now seen at the remote desktop connection of the VM is caused due to the live migration blackout phase. We now find the VM resuming at the target node and completing the reminder of the benchmark scene. We are able to achieve approximately 43 FPS while reaching up to 89 FPS. The total duration of live migration operation was about 30 seconds while the VM blackout was about 3 seconds. This is only early performance data with NVIDIA's alpha driver supporting GPUP live migration. We will use this data to optimize live migrations of GPUP VMs in the customer environments. We are also tracking an important performance fix in development right now that will reduce the blackout duration by roughly 50%. The live migration of GPUP VMs will be coming to Azure Stack HCI, 24H2 clusters, Windows Server 2025 data center clusters, and Azure HostOS 2024. Oh, this is great. But where and when will these features be available? GPUP is already enabled in Azure Stack HCI 22H2 OS release. GPUP will now be coming to Windows Server 2025 OS release later this year, and live migration for GPUP will be available with both the Windows Server 2025 and Azure Stack HCI 24H2 OS releases. NVIDIA GPUs that will support GPU partitioning are the A2, L4, A10, A16, A40, and L40 GPUs. The number of partitions supported is based on the specific driver. It's recommended you work with your OEM partners and IHVs to plan, order, and set up the systems for your desired workloads with the appropriate configurations and necessary hardware. First, a big thank you to all the Microsoft and NVIDIA teams that were involved with these projects, and also a special thank you to those of you that joined this session. It's exciting to be able to share the work our teams have been investing in, and we hope that you will be excited about them as well. We look forward to hearing feedback and questions and be sure to fill out the evaluation. Thanks again.