 Hello, everyone. Today I'll present you a work related to virtual interrupts on the RISC-5 architecture. Virtual interrupts are an integral component of a virtualization solution. However, since the capability of directly delivering interrupts from pass-through devices to a VM, we have not seen much developing this menu. Here, we'd like to take the opportunity of the emerging RISC-5 architecture to introduce some new ideas to keep improve virtual interrupts. In this presentation, I'll first provide some background information on how the virtual interrupts are processed on the current RISC-5 KVM architecture as to motivate our proposed improvements. Next, I'll talk about more detail about our improvement, which includes the design of a virtualization aware interrupt controller and direct delivery of virtual interrupts to a VCPU. Next, I'll show you the detailed implementation in KVM to handle the various kinds of virtual interrupts and VCPU stage changes. I'll also show you some performance results. Lastly, I'll briefly talk about some future work and directions. Let's move on to the background and motivation. The current hypervisor extension on RISC-5 does not include any support for accelerating virtual interrupts. All virtual interrupts are implemented on a trap and emulated approach. There are three types of virtual interrupts that are of concern corresponding to three types of physical interrupts. There are virtual timer interrupt, virtual supervisor software generator interrupt, and virtual supervisor external interrupt. The current flow of the virtual timer interrupt is shown here. Whenever the VM kernel needs to set up the timer for a timer interrupt, it needs to trap to KVM, which utilizes the HR timer and the timer driver, which finally traps the machine mode to set up a physical timer. On the interrupt, the heart is switched to machine mode first. The open SPI then injects the timer interrupt to the HM mode host kernel, which then injects the timer interrupt to the guest kernel. The entire process includes multiple traps and components. Many cycles are spent on emulation of the timer interrupt to the guest. The virtual software generator interrupt is shown here. To send the virtual software generator interrupt, the guest VM needs to call the emulated SPI by the KVM, which causes a trap to the host. The host saves the interrupt for the other VCPU and notifies the heart that the other VCPU is running on. To perform the notification, the host needs to call the real SPI, which causes another trap to machine mode. When machine mode interrupt arrives at the other heart, the heart first switches to machine mode. The open SPI injects a software interrupt to the host kernel. Then the host kernel realizes that there is a safe software interrupt for the guest. Injects this interrupt to the guest when resuming the guest. We can see that virtual software interrupt also introduced a number of traps, resulting in the same situation as virtual timer interrupt. On the current KVM mode rescribe to send the interrupt by the emulated virtual device in cumul, the steps involves firstly to perform necessary emulation over the PLIC. Next, to notify the guest VM, a trap to host kernel is needed by the PLIC. KVM mode using the host kernel next saves the interrupt info for the VCPU and uses the host software generator interrupt to notify the heart on which the VCPU is executing. The VCPU traps and quickly resumes the execution. It picks up the interrupt during the resumption. We see that again, a number of traps are needed in this process. During the handling of the interrupt from the virtual devices, again, a number of traps are incurred. The main reason is that the MMIO access to the emulated PLIC causes traps to the host. There are further switches in the host when the PLIC emulation code in user space interacts with the host kernel. The MMIO is needed to at least acknowledge the pending interrupt and to signal end of interrupt handling, which involves at least two traps. So we notice that all the processing of interrupts involves a number of traps. Meanwhile, other architectures such as X86 and ARM have been providing direct delivery of certain kinds of virtual interrupts to the VCPU for a while. So direct delivery of virtual interrupts avoids excessive traps, reducing overhead and improving the overall performance. The question is, can we have something like that for RISC-5? From that, let's move on to our work on the RISC-5 actually. So we extended the interrupt controller on the current RISC-5 architecture to support direct delivery of all kinds of virtual interrupts, including virtual timer interrupt, virtual software generating interrupt, and virtual supervisor external interrupt. We have also implemented necessary support on the KVM to validate the functional design of the extended interrupt controller. Note that our extensions idea are not dependent on a particular interrupt controller. It can be adapted to future interrupt controller as needed. The existing interrupt controller on RISC-5 consists of two parts, the C-Linked and the PLIC. The C-Linked mainly contains a local timer CSR called M-Time-CNP. The PLIC acts as the routing unit, so to speak, for external interrupts and software generated interrupts. It contains various controls and status bits for the interrupt sources. I'll skip the details of these, please refer to specs if you're interested. We have extended the C-Linked and the PLIC with a set of CSR to support direct delivery of virtual interrupts. For each type of virtual interrupts, we provide CSRs according to the functional requirement of the virtual interrupt. For example, for timer interrupts, we provide a CSR for VS mode and VHS mode that act as the dedicated clock event source, respectively. The CSRs can be accessed directly from the mode they are intended for without trapping. The guest VM can directly set the timer interrupt with VS-CNP, for example. It does not need to trap to the host. All other CSRs follow the same idea. We extended the PLIC to handle virtual interrupts that are not local, which are the virtual soup software generating interrupts and external interrupts. These new registers contain structures that are used for routing virtual interrupts while considering the host's scheduling decisions. For example, for each heart X, we provide a corresponding IFMAPX register. The IFMAP register holds the VMID and VHeartID pair that indicates the identity of a VCPU that is currently running on the physical heart X. When the PLIC needs to deliver a virtual software generator interrupt, it looks up all the IFMAP registers in parallel to locate the target VCPU. If any of these slots matches, the PLIC knows that the target VCPU is on the heart that corresponds to the heart to the slot. The advantage of this design is that instead of contention on the single-share resources such as global tape or certain kind of command queue, each thread only writes to its own slots without taking any logs. The hardware later can perform the local being parallel. The other registers are designed with this same idea. To directly deliver virtual interrupts, the ceiling needs to be able to directly set the pending bits for the virtual interrupts. Currently, such pending bits are only set by host software. We change that so that when V equals 1, the guest receives an interrupt directly. The V bit is risk 5 nomenclature that whether the heart is running a VCPU. To minimize the possible delay, we also consider delivering the interrupts when V equals 0. When a virtual interrupt arrived on a certain heart when V equals 0, the host still handles it. In this case, the virtual interrupt goes along the same handling path as the existing approach. So our design is as good as the existing approach in a slow pass. Note that also conceptually this needs differentiation between the interrupts from the same source but handled by different entities. Because even if the host claimed the virtual interrupt, the handling does not finish. The virtual interrupts should still remain pending for the guest. So to differentiate the pending states of the same virtual interrupt for the hypervisor and the guest, we need to introduce additional bits. Different types of interrupts need to be treated differently. We don't need one for the virtual timer interrupt because the hypervisor doesn't really claim a virtual timer interrupt since it's level triggered. Having separated HIP, the VSTIP bit and VSIP, the STIP bit does not really make sense. Eventually the timer will clear any pending bits once the value inside is updated to represent future time. We do need to separate the enable bits though, the VSIE, the STIE and is no longer an alias of the HIE, the VSTIE. We introduced additional bits for VSSI by separating the VSIP, the SSIP bit from the alias relation with HIP, the VSIP bit. We also need to separate the enable bits. We use for virtual supervisor external interrupt, we use another notification interrupt to represent the host interrupt. So the separation is completed with the pending bit for that notification interrupt. To support the above interrupt controller extension, the KVM's main task is to handle interrupt when V equals 0 and to maintain a consistent interrupt context for each VCPU. The VCPUs in KVM essentially is in one of the three states showing this in this state transition diagram. KVM's main task concentrates on the right portion. When the VCPU is in pause and scheduled off states, the KVM handles the virtual interrupts saving them for injection if necessary. When the VCPU transits between the two states, the KVM perform necessary context maintenance in KVM-VCPU put and KVM-VCPU load functions. The reason for the host to handle the virtual interrupt for the guest is that if the VCPU was not scheduled off, it's just paused. The interrupt controller will still regard the heart as having the VCPU running because the update of the IFMAPX register happens in VCPU put, KVM-VCPU put and KVM-VCPU load. Then the virtual interrupt will still arrive but the VCPU is not executing. Next, we will show how KVM performs these tasks for each kind of virtual interrupts. That brings us to the actual trapless virtual interrupt. For virtual timer interrupts, to set a timer, the guest writes to the VS-TIME-CMP CSR via the ADS-STIME-CMP to set a timer. The host writes directly to the VS-TIME-CMP to switch the VCPU context. When a timer interrupt fires, if the VCPU is in running state, V equals 1, the HIP.VSTIP is set, the guest handles it directly. In pole state, V equals 0, HIP.VSTIP is still set, however the host handles it this time. The host does this to allow a necessary priority adjustment for scheduling, for example. In the scheduled off state, the host tracks the VCPU timer using HR timer, which is the same as the existing approach. So no timer virtual interrupt is handled, is fired. During the VCPU state transition, the KVM needs to maintain consistencies. So during the transition from the running state to the pole state, the KVM saves the state of the VS-TIME-CMP CSR so that if the VCPU enters the pole state because it executed WFI instruction, the host needs an updated view to whether the guest has enabled interrupt so they can bring the VCPU out of the VCPU block function later when the interrupt is intended for the guest. For the transition between the pole state and scheduled off states, the KVM needs to save the value of the VS-TIME-CMP CSR and utilize the host HR timer for city to track. The timer set by the guest as shown in these lines on the slides. The host needs to temporarily disable its own handling of VS-MODE virtual timer interrupt because if at any moment the value being saved or restored happens to be in the past, the host needs to avoid double handling of the same timer interrupt again. This is because we still need to maintain the level trigger semantics of timer interrupt as the same as defined on the physical platform. So with the extended interrupt controller and the KVM support, now the virtual timer interrupt can be directly delivered to the VM kernel when the VCPU is running. When the VCPU is paused, the interrupt is delivered to the host which immediately injects it to the VCPU. When the VCPU is scheduled off, the host utilizes the HR timer to track the virtual timers. The HR timer uses the new timer dedicated for the HS mode, avoiding traps to the machine mode. So that's for virtual timer interrupt. For virtual supervisor software generated interrupt, when a guest VCPU needs to send a software generated interrupt for another VCPU, it writes to the Sgen IPI CSR to do so. It does so and specify the target VCPU ID that's intended to receive the virtual software generated interrupt. After the write, the POIC received the target VCPU ID, so it looks up all the IFMAP registers whether it is one that contains a specified VCPU ID. If found, it delivers the VSSI to the heart with M heart value N, which is the N slot that contains the matching VCPU ID. The VCPU can be either in running state or in pulse state. If it is running, the guest takes the interrupt directly and handles it. If the VCPU is paused, the host takes the interrupt and injects it to the VCPU. If there is no such a slot that contains the specified VCPU ID, a notification interrupt is delivered to any host heart for later injection, the VCPU now must be in the scheduled off state. The KVM job is also to maintain context for virtual software generating interrupts during the same state transition. When the VCPU transferred between the running state and the pulse state, the KVM synchronized the pending state for the VSSI and the hardware VSSI IP register. Recall that we separated the pending base for the guest and the host, unlike the virtual timer interrupts. When the VCPU is in pulse state, the host handles the VSSI that receives because the PLIC thinks that the target VCPU is still running on a particular heart. When the host schedules off a VCPU or schedules on a VCPU, it updates the corresponding IFMAP register. When the target VCPU is scheduled off, the PLIC sends the notification interrupt to the heart so that the host can wake up the target VCPU and resume its execution. With the notification on the interrupt controller and KVM support, now the VSSI can be directly delivered to the VM kernel when the VCPU is running, avoiding traps to the host. When the VCPU is not currently running, the host receives the interrupt and injects it to the VCPU, same as the current solution. Similarly for virtual supervisor external interrupt. The virtual machine is created, the host allocates a piece of memory for pending bits of the external interrupts for the VM. When the external device needs to send a virtual supervisor external interrupt to the guest, it writes the new UGEN VSE ICSR to send an external device interrupt to the guest. It writes the VM ID and the interrupt number into the register so that the PLIC received this, it looks up the virtual interrupt affinity tables pointed to by the V table base and registers and located the VCPU that this virtual interrupt should be delivered to. Next, the PLIC performs similar tasks when it's looking for a particular VCPU for the software generated interrupt because this kind of interrupt also resembles the software generated interrupt because they are between happens between hearts. So the PLIC looks up the IFMAP registers to see if there's a matching VCPU ID, if there is, it delivered to the virtual interrupt to that particular heart. If the heart, if the VCPU is running, it takes the interrupt directly, if it's not, the host takes the interrupt and injects an SEI to the VCPU. If not found, a notification interrupt is delivered to the host. So after the guest received the external interrupt, it needs to handle it. There are at least two steps during the handling of the interrupt. So for guests to claim and to signal the end of interrupt processing, the guests just directly read from the right to the VS Claim register to claim and signal it. So the running and writing to this particular register does not cause traps to the host in our design. So similar to the other virtual interrupts, the KVM saves and restored the HIP CSR when the VCPU transits between the running and the pulse state. When in pulse state, the host handles the VSE that it receives because the PLIC thinks that the target VCPU is still running on the heart. The PLIC reads the same IFMAP register to decide if a target VCPU is executing on any heart. When the VCPU is scheduled on and off, the host needs to maintain the pending base for that particular VM. Like this. When the VCPU is in scheduled off state, the host handles any notification interrupt for VSEIs and that targets the scheduled off VCPU. So with this extension and corresponding KVM support, the VSEI can be directly delivered to the VM kernel when the VCPU is in the running state as shown in this diagram. When the VCPU is not running, the host handles the interrupt and injects them to the VCPU in a similar fashion as VSEIs. On the processing and acknowledging side, the claiming and signalling end of interrupt also happens without trapping the VM kernel that directly reads and writes a V claim CSR, which updates states accordingly. Let's move on to the implementational results. We have implemented above interrupt extensions in RISC-5 cumul version 5, which provides an emulated RISC-5 environment with the hypervisor extension. We're using cumul because right now there have been no hardware that RISC-5 hardware that contains hypervisor extension, so we've got no choice. We have also implemented necessary KVM support. On this KVM support, we run benchmarks that can be compiled and run in the emulator with reasonable support. And obtain some performance results, mainly we are able to compile around RADIS and UNIX Bench. And here are the results. This benchmark shows the performance boost in RADIS obtained by adopting the extension for the virtual timer interrupt. With one VCPU and the virtual timer interrupt extension, we observed around 60% to 70% of performance boost compared to the original solution. The game mainly comes from the elimination of the traps to update the timer and to deliver the interrupt. Similar results are observed for UNIX Bench. This figure shows the performance boost in RADIS with both the virtual timer extension and the VSSI extension. With two VCPUs, we observed around 50% performance gain compared to the original SPI based interrupts. The performance gain again also comes from the elimination of the excessive traps to the host and to various modes in the host. For VSSI, we use PIN to the virtual machine from the host and measure the latency. With our extension, the latency is reduced by 11% on average. The gain also comes from elimination of traps due to IO operations. The traps are like we PIN the machine 100,000 times and we observe traps reduction due to MMIO by around 300,000 times. This number is expected because the current flow of IO handling do cause around three traps per handling one interrupt. So we think these results here are strong indication that the extension can reduce the performance cost due to frequent traps. Although actually when implementing a hardware, the magnitude of the numbers may change because we are still using an emulating environment. So for the future, we like to continue to explore along the direction to include support for password devices, which means we need an IOMMU in the first place. We still need to furnish our extension with more details with practical implementations such as priority controls, etc., etc. We also like to seek to integrate our extension with future RISC-5 interrupt controllers, as well as validating our ideas with more hypervisors. With that, I end my presentation and thank you for your attention.