 And I'm very happy to hear my experience of debugging KVM using internal DCI technology. This is my first time to present in the KVM forum. I will do a quick self introduction. My name is Raymond Zhang. I used to work at Intel from 2003 to 2016. In 2009, I joined a project which is based in Amsterdam. Our goal is to make a media graphic TPU to work in Windows VM. At that time, it's quite difficult to make a graphic TPU to work in VM. Usually, there is a very big TPU inside VM. After we passed through the device to VM, we installed the trial. It's binary. There was a random TDR blue screen. I spent a significant effort to do the debugging. After three months, I found the root cause in the MMM code of the shadow memory logic of them. At that time, the only way to do debugging is print and log. Today, I will share how to use a debugger to do the debug, which is much more efficient. Here is a classic ZAN architecture. There is a hypervisor under all VMs. There is a special VM called Domain Zero. It's a privilege. It provides service and it also manages other VMs. For example, there is an IO email process for each VM to do a device emulation. Usually, we only let Domain Zero to see the real hardware. All other VMs only see the emulated fake devices. The hyperv from Microsoft has a similar architecture. There is also a hypervisor under all VMs. In Microsoft's term, VM is called partition. The parent partition is created to Domain Zero of them. For Windows 10, Microsoft introduced a feature called isolated user mode. Also, it's called virtualization based on security. If this feature is enabled, a hypervisor will run. Above that, Windows 10 is running. There is also another special secret kernel run on top of the hypervisor. There will be a lot of VM exits when Windows access hardware. After this feature, it turns out actually the performance will be sacrificed. I think PBM has a different architecture and is better. I believe KVM has a smarter design. From the architecture perspective, KVM combines Domain Zero's kernel and hypervisor into one. Actually, they are run in one address space. With this structure, it has a lot of benefits. Domain Zero's kernel will not exit to have a wiser when it accesses hardware. It has advantages. Some people may not agree with my previous diagram. Here, I will show some code. Here is a code from vmx.c. Actually, it's a code from Intel to support VM technology. We can see it's VM extension for KVM. This function has a very important reason that vmx is an instruction from the CPU. Actually, it turns on the hardware support about VM, about VT. Using this example feature of debugger, we can see the instruction is inline into the hardware enable function of KVM internal module. The vmx on instruction has a very special role in the VT technology. Actually, the software which is killed on firstly will win the hypervisor role. After that, if the software kills the instruction, it will fail and it will be treated as a trigger. With this diagram, actually, it shows CPU's execution route in the VT environment. We will talk about it later in more detail. We can set a breakpoint at the VM background instruction. After we set the breakpoint, we let the target go and we launch KVM in the Ubuntu target, for example. After we click the launch button, the breakpoint will be hit. And using the cost stack function, we can see the cost stack. From the cost stack, we can see the Linux kernel is invoking KVM module to do hardware enable. Actually, it's using the Linux kernel's SMP mechanism to let all CPUs attend to VM to enable to turn on the feature. After KVM gains control, it will begin invoking KVM Intel. Actually, it's a function pointer registered into KVM. So, we can see the hardware enable function is killed on the EW. It killed the VMXR instruction. After this instruction, actually, the code which killed the instruction wins the hypervisor role. And according to this cost stack, we can see the hypervisor code and the Linux kernel runs in one address space. So, it's safe to say the hypervisor and the Linux kernel is combined in KVM architecture. Actually, the KVM name is very good. It means kernel-based working machine technology. So, the kernel plays a hypervisor role. It's smart. That's a quick view of KVM. So, next, I will talk about DCI. Then, I will talk how to debug KVM using DCI. Here is my debugging environment. Actually, I'm in travel. I'm talking in a small hotel. Here is the machines I used. So, laptop is running as host. And for the target, I run Ubuntu. I turned on the KVM feature and also installed Ubuntu inside when VM. The handler of the target is the internal CPU, low-power. For the target, the BIOS is customized. I will talk the reason shortly. Between the target and the host, I use a USB 3 cable to connect them. That's DCI technology. DCI stands for direct-connect interface. The name is very easy to understand. It means we can connect the host with target very easy, very direct. I think before DCI, we have to open the choices to connect the ITP connector. Before DCI, I used ITP technology to do low-level debugging. For ITP, the connector is our motherboard. It's not supposed to choices. So, we have to open choices. That's what we need. For DCI, actually, it's introduced by Skylake in 2016. ITP is a very good technology. It holds powerful ITP that detects the Ganttwin technology to USB connector, which is very easy to connect. Inside the silicon, actually, there is a new component called XBCI. XBCI will work with GTAC SCanttwin and transfer the data to XHCI. That's the USB 3 controller. They will work together to impose ITP technology to a convenient USB port. Here is the device list using DCI debugger. Actually, here is an internal system studio. I think the beauty of DCI debugger is you can see a lot of internal devices using the debugger. For example, here we can see the internal car, also the anchor part, also the integrated devices inside the website. For people doing low-level development, for example, OIS development or general development, I believe it's very useful. Actually, there are two types of DCI. One is called Boundary Scan Setband Hosted DCI. It needs a small box. The box is called Closed Traces Adapter CCA Insult. The other way is called USB Hosted DCI. It only needs a USB 3 cable. The second way is low cost and it's convenient. Actually, I'm using the second way. The constraint of the second way is it can only do I0 debugging. It cannot debug early wake up. For CCA, it can debug early wake up. That's a long page of the CCA tool. I have to mention that to do DCI debugging, we need usually customized BIOS. The reason is for security concern, Intel advise OEM to turn off the feature in BIOS. Just lock the feature. After it's locked, it cannot be turned on again without power cycle. It means if it's turned off inside BIOS, there is no way to turn on it in OIS. That's why we usually need customized firmware. After I introduce the DCI, then I will share my experience about debug KVM using DCI. I will talk some typical scenario to do the debugging. For example, VM create. There are a lot of steps to create a VM. I will talk some examples. For example, here is a breakpoint on VMS create VCPU. When this breakpoint hits, we can examine the details about creating VCPU. After creating VCPU, the KVM will create virtual MMU. Virtual MMU is just a simple name. It's creating the facility of memory page for the VM. Just turn on the, for example, the EPP technology of the hardware. If no EPP support in hardware, I believe it needs a special shadow page mechanism in software. After creating a VCPU and the virtual MMU, the KVM will create a local APIC for the VM. Local APIC means advanced programmable interrupt controller. That's a virtual GIC for the VM to do interrupt transfer to the VM. After that, KVM will also create some facility to emulate hyperways VAM bus facility. I believe this is to speed up the device communication between VAMs. In the times, I think in 10 years before, we usually emulate the device using the real protocol, but it might be slow. The VAM bus has a special protocol defined for VAM. It has a better performance. After that, KVM will create a ProPrize PIT for VM. It's for timer interrupt. Timer interrupt is a classic facility for PC. Actually, today, it's still used so that the machine has clock support. So that's about VAM create. So now I will talk about VAM exit. Actually, after a VAM is created, the CPU will run the VAM launch instruction or VAM enter. Then the CPU will enter the guest domain and run the instruction inside the VAM. For a normal instruction that's unprivileged instruction, the CPU will direct execute the instruction that's called directexecute. But when the CPU meets a previous instruction or IO instruction, then it will exit. That's called VAM exit. So from to some extent, when a CPU is running a VAM, it runs in such a loop. It enters VAM and if it meets a special instruction, it's also called Sunday TV instruction, it will exit to Hypervisor. Hypervisor will solve the issue and after that, it will let the CPU enter the VAM again. That's the loop. So here VAM exit doesn't mean VAM start down. It means the CPU exits from VAM for some reason. One typical reason about IO access. I think this is also the primary way to stop VAM destroy hardware. In the VAM environment, there are several VAMs. They may share one real device. So if they access the hardware, it will exit and the Hypervisor will do the management. Here is a breakpoint in the debug. We can see the cost tag and source code. Actually, there are two types of IO. The first is a port IO. It's old and it's inefficient. It's created by the classic PC ports. Right now, most IOs, memory maps, IOs are more common and better performance. So for Hypervisor, actually today we use two emails for both KVM and VAM. For Qemail, actually we will register IOPOD function callbacks for IOPOD. So actually when we create a new work device, the main job is to develop the read-write callbacks. In QVM, there is a QVM IOPOD. It will manage all IOPODs and register function callbacks. For example, here is a VAM exit for port IOPOD. Then, at clear emails, the CPU will do the IOPOD access inside the VAM and exit. After the exit, QVM will handle the exit and invoke QVM in-tower. That's the expansion by in-tower. QVM in-tower will track the exit region and invoke the fast PIO function. Also, then it will walk through the IOPODs to find the read-write callback and invoke the callback. Here we can see the port address is 71. Actually it is the classic CMOS device. There is an aggregated callback. Then I will talk another example about memory map IOPOD. Actually the API is using MMML, it's read callback. Here is Cosfac in debugger. We can see it's a write access and invoke the QVM write MMML and it will dispatch to the APIC function callback. From the parameters, we can see this is a special address. Actually it's the APIC register. In the APIC spec, it defines the linear address range beneath four gigabytes. Actually it's a result for APIC. So here are some useful breakpoints to develop QVM for reference. Here is a quick real case. Actually when I turn off the target machine, do you want to take a long time at the power off screen? It seems hard on somewhere. Without a debugger, it's very hard to do the debug. Then I use the debugger to break it. Then the target breaks into the debugger. I check the CPU 0 firstly. And from the Cosfac, it's servicing a reboot interrupt. And from the source code of the reboot interrupt, it's doing a stop CPU. Actually this function will trigger the API. It will trigger the API to let all other CPUs respond to the command. And when we switch to CPU 2, we can see it might panic and it's in a constant way. So that's why it's a hind. It means when we turn off you want to, with some VM is on, there are some special calls triggered a panic of linear kernel. With this panic, there is delay inside the kernel and the call is 0 of about one minute delay. So that's all about my talk. I'm very thankful for your attendance and see you next time.