 Hello, everyone, I'm Anup Patel, I'm from Ventana Microsystems. So this talk is about AI support in KBM-RISPY. So the talk is divided into three parts. First, we'll start with an overview of AI specification. And then next, we'll look at the design and features of AI support in KBM-RISPY. And lastly, we'll look at current software status in this context, and we'll also have a short demo. So let's begin with the overview of AI specification. So AI stands for Advanced Interpreted Architecture. It's a news bag that's been worked on in RISPY International since the past two years now. So it supports a lot of features. Like, of course, it addresses all limitations of RISPY Blick, which is present in existing RISPY platforms. It's scalable for a large number of parts, but it also defines functionality as optional modular components. The most important thing is it supports MSIs. And the IPIs are also supported as software-injected MSIs. And it also supports MSI visualization and IPI visualization as well. So currently, the specification is in so-called stable state, as far as by international. It will be frozen by RISPY Summit, which is coming this December 2022. And the link over here is pointing to the latest specification. And we have three optional modular components defined by this specification. First is the extended local interrupts. We call it EIS ESRs. And then we have incoming message signal interrupt controller. We call it INSIC. And then we have Advanced Platform Level Interpreted or APLIC, as we call it. We'll look at all of these things in coming slides. So first, look at the extended local interrupts. So this feature is defined as two different ISA extensions called SMAIA and SSAIA. So SMAIA deals with the new AIS ESRs for M-Mode. And SSAIA extension deals with the new CSRs called HS or S-Mode, actually, or, in fact, BS-Mode as well. So with the AIS extended local interrupts, we have 64 local interrupts for both RV32 as well as RV64. And each of these local interrupts now have a configurable priority, which was not there before, of course. And then we also have something called as local interfiltering. By that, a higher privileged mode can take an interrupt and selectively inject that interrupt to a lower privileged mode. And this is available for both M-Mode to S-Mode filtering as well as HS-Mode to BS-Mode filtering. And the CSRs which are added by these extended local interrupts are largely back up. In fact, they are totally backward compatible with the existing local interrupt mechanism defined by the RIF-5 privileged specification. So which means that the existing software will not work totally out of box without AI support as well. But yeah, of course, to use the AI support will need to extend it. Moving to the MSI part of the AI. So MSIs are supported using EMSIC or incoming message signal intercontroller. And we have one EMSIC instance next to the each heart, which means that there is no actual limit on the maximum number of hearts supported by EMSICs actually. It will naturally scale with the number of hearts. And then each EMSIC instance will further have multiple intrafiles. We have one M-Mode file. We have one S file. And then we have multiple guest files or VS files up to GEI-LEN VS files will be supported. And then each intrafile also consumes like four KB of physical address space where they are mapped. So this physical address space typically has one MMI or register where the devices or other hearts can write for injecting MSIs. And the entire intrafile configuration is totally done via AI CSRs. And each EMSIC file can support up to 2047 intrapidentities or MSI IDs, I would say. And so MSI and IPI virtualization both supported through VS files of each heart. Of course, heart also need to have H extension. So this figure shows a pictorial representation of what I just said. As you can see, we have a separate EMSIC instance next to each heart. And then we have different devices, platform or PCI devices, which can directly write the MSIs to the different files. Another interesting thing about this figure is and that we have a convention over here is that the solid arrows are like actual hardware lines or signals. And then the dotted arrows are like MSI writes. And we'll follow this convention in coming slides as well. So this figure largely shows how EMSIC works on an MSI only system. So next is how we support IPIs using EMSIC. So like I said, IPIs are supported as software injected MSIs. So by that, it means that a particular heart can directly write to the EMSIC file, a file of EMSIC E2 of other heart actually directly. For example, the M mode firmware, if it wants to inject the IPI to another heart, it will directly write to the M file of that other heart. Actually, same applies to host OS or hypervisor and also applies to the guest operating system as well. So for guest operation, the guest operating VC2 can directly write to the VS file of other heart. And you see, because of this supporting IPIs as a software injected MS, we naturally support IPI visualization as well. And so moving on. So moving on to the wired interrupts, we support this using APLIC in AI. So with APLIC, we have multiple APLIC domains which are hierarchical in nature. All the wired interrupts are actually connected to the root domain, root APLIC domain. And each APLIC domain targets a particular privilege level associated with a set of hearts, of course. And then, of course, an APLIC domain can also delegate interrupts to the child APLIC domains since these are hierarchical. Complete configuration of the APLIC domains is done through memory map registers. In addition, we have line sensing, configurable line sensing, priority and target heart for each interrupt source. One APLIC domain can support up to 1023 interrupt sources and it can support target up to 16384 hearts, actually. The most notable thing about APLIC is that we have two operating modes. One is the direct mode and MSI mode. So in direct mode, APLICs will directly inject interest to the hearts without anything in between. And in MSI mode, the interrupts are forwarded as MSI. So APLIC will take wide interrupts and it will convert them using a state machine into MSI rights to target in some of the hearts, actually. So, and the amount of memory space consumed, physical letters are consumed by each of these modes. So direct mode consumes, so each APLIC domain in direct mode will consume up from 16 kb to somewhere around 528 kb. And each APLIC domain in MSI mode will just consume a flat 16 kb of physical letter space. Yeah. So this figure shows pictorial representation of APLIC in direct mode. As you can see, all the wide interrupts are landing into that root domain, which is typically targeting M mode interrupts of different hearts. And then the root domain can selectively delegate some of the interrupts to the S mode domain and then the S mode domain will target S mode external interrupt of the different hearts, actually. And yeah, one more thing is that, so this figure also is aligned with low end systems where there will not be MSI. So in case of low end systems, you can just use only APLIC part of the AI specification. So moving on to wide interrupts as using APLIC MSI mode, actually. So this figure actually resembles more real world situation where we'll have mix of wide interrupts as well as MSI interrupts in the same system. So where all this wide interrupts will again land to the APLIC root domain M mode domain and then we should further delegate to S mode domain and then both S mode and M mode domain will further convert this wide interrupts into as MSI writes, targeting some of the INSIC files of any of the CPUs. And we can also have different platform devices and PCI devices are capable of generating MSIs and they will directly inject MSI to the different INSIC files. Yeah, so this is like very close to real world use in many of the real world use cases. So regarding the AI virtualized support in these different components. So the CSR for the CSRs, we have separate VS mode CSRs for each guest or VM. And then the local interpiretists for the VS mode are virtualized using the HVICTL, HVI priority and HVI priority CSRs. INSIC virtualization we already spoke in the lens. We have multiple guest files or VS files for each heart which a guest will use. Each guest VCU will be assigned one VS file and the G stage of the guest or VM will be having a mapping for that VS file. And we'll also have to select that VS file using H status VGI and bits when the VCPU runs. And the most notable and important thing is that there'll be no traps when a guest VCU uses a VS file. And in addition to this thing, a hypervisor can inject immunity diapuse by writing to MMI or registers of the VS file assigned to the guest VCU. Or a hypervisor can also route a forward device, MSI directly to the MMI or register of the VS file using some kind of an IOM view. Of course, important information here is that we also have an IOM spec in flight which should be again be available sometime next year. And then in addition to hypervisor, also take interrupt which are meant for VS file and take it itself to do appropriate processing using HGI CSR. And lastly on the APLIC friend, APLIC only supports version partially. The thing is that APLIC, all the MMI registers are typically trapped and emulated by a hypervisor. But particularly in MSI mode, there'll be no MMI traps at the runtime. The only traps will be there at the boot time. Because only for configuration, it will require the MMI traps. But in direct mode, there'll be MMI traps at time of handling interrupts as well at runtime. So moving to the second part of the talk, that's details about the AI support in KVMVIS file. So we have two parts of the AI support in KVMVIS file. First is how we virtualize the CSRs, AI CSRs. And the next part is how we deal with the IRC chip or the EMSI can APLIC virtualization. We call it AI internal IP chip. So the CSR virtualization is always available when the underlying host has the CSRs or the SS AI extension. So it can't be turned off kind of thing. And which means that KVM hypervisor will always virtualize the CSRs when they are available in the host. And it will always save restore the CSRs. So KVM user space can access this guest VCPU AI state using one range of items. And for the AI internal IP chip, it consists of two parts. One is an optional APLIC with MSI delivery mode only. And then we have one INSIC file for each guest VCPU. So intentionally APLIC is optional because the KVM user space might choose to implement emulated in total in user space as well. So that's totally around. And that the whole AI internal IP chip feature of KVM display is as much as a whole optional as well. So KVM user space can choose to only use the AI CSR virtualization part of the AI support and emulate the entire IRQ chip in user space itself. That's totally up to the KVM user space. So more on the KVM internal IP chip at any point in time a guest VCPU might be either using a software file or it might be using a hardware VS file. So software file means that it will actually be trapped and emulated by software in hypervisor. And VS file means it will be virtualized by hardware. So the INSIC VS file assigned to a guest VCPU must be updated when the underlying heart is in. Which means that if a VCPU is moved from one heart to another heart, we need to of course change the VS file assigned to that VCPU and use a VS file from the new heart itself. So this is an interesting and a challenging part as well for AI support in KVM. We'll look into details in coming slides. Apart from this, in general the KVM AI IRQ chip has three operating modes. First is an emulation mode, where all the INSIC files are always software files which means that all the INSIC files are always trapped and emulated. And then we have a hardware isolation mode where all the INSIC files are always VS files, which means that KVM hypervisor will always try to use hardware isolation for the guest VCPUs. And then of course the hardware isolation mode will only work if the underlying host has hardware VS files available. If it's not available, we can't use the hardware isolation mode of internal IRQ chip. And the last and the most important mode is auto mode or automatic mode where the KVM hypervisor dynamically switched between VS files and the software files based on availability of VS file in the underlying host CPU. And in other words, this mode is all about dynamically switching between trap and emulated mode and software emulated mode actually. Again, this mode is only available for when underlying host has actual hardware VS files. And the most important part about this mode is that it allows creating, running more VCPUs compared to the actual number of VS files available on the heart actually, which is greater than GEI line. So basically it makes the thing whole scalable and dynamic the selection of VS files. So setting up an internal IRQ chip in from KVM user space is a three step process. First we, KVM user space will create a AI device file using KVM create device IPL. And then it will do a bunch of configuration. It will set some of the configuration parameters of the AI internal IRQ chip. The mandatory ones are setting the number of interrupt identities or MSI identities and setting the heart index bits of the MSI base addresses and then also setting the individual base address of each VCPU. So apart from these, if KVM user space wants to use the APLIC support in kernel, it can also set the number of interrupt sources and also set the IPLIC base address. So once this all configuration is done, the first step is for KVM user space to finalize this thing and do something called as control unit. And once the control unit is done, we can't change the configuration which we just did in the step two. And KVM hyper is only emulate or versionize the AI which after the control unit has been done. So without doing the control unit, the KVM IRQ chip will not function. So accessing KVM internal IRQ chip from user space after it is done is very straightforward. So just inject emulate IRQs. We have two IRQs, KVM IRQ line and KVM signal MSI IRQs. And to access the APLIC registers, we simply use the get set device attribute IRQs. Same applies to even in-sync registers. We use the same get set device attribute IRQs. And we can actually save restore APLIC and in-sync context using these two IRQs actually at runtime. So moving to virtual IPIs using in-sync VS files. So as I already said to inject IPIs, I guess VCP can directly write to the VS file of other target VCP. So this figure actually shows how a VCPO A running on heart zero directly injects an IPI to some other VCPO of the same VM running on some other heart X. So VCPO A will typically do MMI overwrite using a guest virtual address, which will be translated to using two-stage translation. And the MMI overwrite will land to the VS file being used by the VCPO B. And the VS file will further inject a signal to the heart X CSRs. And then heart CSRs will further inject interrupt to the software. So which means that VCPO B will take an interrupt whenever it starts running or whenever it runs actually. So as we can see, there is no traps involved for injecting an IPI from VCPO A to VCPO B. It's totally trap-less. So this is how IPI virtualization works with AI in-sync VS files. So looking at in-sync this is another very common case where typically real-world VMs will have a lot of emulated devices in user space or in kernel space. So this figure actually shows how KVM user space can inject emulated IRQ using IOTL. So typically KVM user space on some heart zero will do a KVM IOTL, it could be IRQ line signal or MS signal MSI IOTL. So KVM high-project in the HSS mode will take the IOTL and convert that into an MMI overwrite. The MMI overwrite will eventually land to the VS file associated with the target to VCPO which is supposed to take the interrupt. The VS file will signal the interrupt to the heart CSRs and the heart CSRs will further inject an interrupt to the VCPO B whenever it runs. So again, this process as we can see, there is no extra trap taken on the heart which is running VCPO B. So there is only an IOTL involved which is anyways involved in any other IOTL as well. So other very important use case is routing device MSIs directly to the guest or VM. Let's say we have a system with some IOMMU or rather the RISFI MMI which is being defined. So in this case, as we can see in this figure, actually we might have a platform or PCI device which is directly assigned to a VM and that will generate MSI right based on a guest physical address GPA. And then the IOMMU G stage will actually translate this MSI right in post physical address MSI right. And then the MSI right will eventually land the VS file and the same story VS file will again inject interrupt signal and interrupt to the heart CSRs and whenever the VCPO, the receiving target VCPO runs, it will take the interrupt immediately. Over here as well, like there are no extra interrupts or traps taken by the hypervisor to mediate this whole process. The MSI is directly injected to the guest or VM. So all these virtualized features of AI support are fine but another interesting aspect is that a VCPO having who is directly taking device interrupts or taking directly IPIs from other VCPO using VS files might slip as well when there is no work, of course. So when a VCPO slips using something called as WFI traps, what happens is that KVM hypervisor will actually reroute the VS file interrupts to itself. And when the actual interrupt happens, while the VCP is sleeping, the hypervisor or KVM hypervisor will actually wake up the VCPO and resume it so that it can process the interrupts. So this is very important aspect to have a seamless interpersualization behavior in case of AI. So moving to the most challenging part that we faced in AI supporting KVM VS file is how do we move the VS files? So like I said, VS files are per heart or per CPU. So whenever a VCPO moves from one heart to another, we have to change the VS file assigned to the VCPO, of course, which means that let's say it moves from heart A to heart B, then we have to pick a VS file from the INSEC associated with the heart B. So this is a bit complicated process in case of AI virtualization and for KVM, we have to do these things in the run loop itself and run loop is really, really performance sensitive area. So as this flow over here in this figure shows, the process of updating the INSEC file of a VCPO in the run loop is actually divided into a fast path and a slow path. So most of the time we'll be doing fast path where we won't have much overhead updating the INSEC file every time we are in the run loop. But sometimes we'll have to go to the slow path, particularly when the VCPO changes the underlying post. And in the slow path further, you can see there are two parts, one is the red boxes and yellow boxes. So yellow boxes are particularly in path when we are after moving to a new heart, the VCPO is able to acquire a new VS file and the yellow boxes show the flow of moving to the new VS file from the old file and moving the state as well. When the red boxes show the case where after moving to the new heart, the VCPO is not able to acquire a VS file. So in this case, what we do is like, we end up falling back to the trap enumerated mode. And if there were some devices assigned to it who are injecting, I mean directly then in that case, we'll also have to map the IMMU to use something called as memory-resident interface, which is a feature of IMMU being developed. So as you can see, all the cases are covered. It's just that most of the time, we'll like to spend in the fast path itself for performance reasons, of course. And to get most performance, we would encourage that VCPOs don't move around the heart, post-CPUs or post-hearts, change the hearts frequently actually. So moving to the last part of the talk, there's a software status and IMMU actually. So the complete proof-of-concept implementation has been done on QMU, OpenSBI, Linux, KVM and KVM tool. And we are supporting both the device tree as well as ACPI for this. And all the device binding and ACPI discussion has already happened on this five international mailing list and the community meetings. We still need to send out the patches to the Linux mailing list. We also need to send out the ACPI, ECRs for the EI to the UEFI forum, sorry. And all the upstreaming for QMU and OpenSBI has been already done. The only pending work for upstreaming is Linux, KVM and KVM tool. And the links over here point to the patches and the branches which contain these patches actually. So we are in a very good shape as far as software goes for the EIs. And we hope to send out the patches very soon for KVM and Linux. Now it's the time for a short demo where I'll show you, where I'll show you the, we are running, just a minute. So over here, what we will do is actually, we will try different cases with AI. But first we'll try the vanilla case. As you can see, there is no AI being specified in the QMU command line over here. Okay, and when we run this thing actually, we will see that KVM will not use AI virtualization, okay? So there is no AI in the host in any form. Okay, we can see that by cat, crawl, CPU, and go. And cat, okay. And now if we just launch the VM, it will simply launch without AI support and the entire IRQ chip will be emulated in software by KVM, user space that is KVM tool over here. So as you can see, it's booting the VM, we took over, and it's using click actually. You can see it's using click because there is no AI support first. And yeah, so it's up, right? So let's look at another demo actually. This is the standard thing we have right now. So let's say we just want to have an AI intercontrollers now in the word machine, but without VS files, okay? So what will happen if you don't give any VS files? So as you can see to enable the AI in QMU, we just say AI equal to APLIC-IMSIC as additional parameter to the word machine name. And it will start, it will create a machine with the IMSIC support and AI support. So we can see that we have IMSIC enabled. So on the host, so host is now using IMSIC and it's also using IPIs for the IMSIC. And we can also see that there is also an APLIC. So an APLIC is routing interrupts as MSIs and this is what it has printed actually. And we can also see that the AIS CSRs are also there from that Proxipian for. So you can see that both M mode and S mode CSRs are present on all the CPUs. So now if we create the VM with the same set of binaries, it will detect that there is AI support in the kernel. It's usable as well. And it will use the internal IRB chips and the KVM CSR was what's left as well, AI what's left. It's running the same kernel inside of VM as well. And as you can see, now the VM is using IMSIC instead of APLIC. And it is still using IPIs using SBI because the KVM is also detected that hardware acceleration is not there, which means that auto and hardware acceleration modes are not there. So it hinted the guest OS that don't use IPIs with IMSIC. So it's falling back to either. Basically in this case, since there are no VS files, this entire IMSIC is trap enumerated by software. And same applies to APLIC as well. So APLIC is also trap enumerated by software. And as you can see booted fine, right? So you can see that. And even the user space reports that it has S mode CSRs. So now let's look at a more full flash demo where we have VS files as well. So the only difference compared to previous is we have APLIC and IMSIC in the word machine, as well as we also have seven guest files on each heart, VS files. So if we run this thing, as usual, the host does not see much difference. It will continue to use IMSIC guest files, right? And it also use IPIs for that. And it will also have APLIC as usual, right? And same no change in the proc CPU info. We have both the CSRs available. And then let's try to launch a VM. Now if we launch a VM, we'll see some interesting stuff. Because now there is hardware isolation available in the underlying host because we have VS files. So KVM user space will detect this fact that auto mode and hardware isolation mode are supported and you will try to play for auto mode. And you'll see that even the IMSIC VS files will be like now hardware virtualized rather than being trap and humiliated. And this is evident from the fact that now IMSIC tries to use IPIs within the VM itself using the VS file directly because that's what KVM tool hinted to do. And same thing, the APLIC is still trap and humiliated, but yeah, it's in MSI mode. So there is no traps at runtime for APLIC actually. Where is it? Yeah, it's here. So and you can see that we have again AI support in both, right? And so that, yeah. So yeah, this is like full blown virtualization using AI running entirely on QMU. It is using all the features of AI and all the virtualization features as well. Yeah. This is pretty much it about the demo. Let's move back to the slides. So we're done with the demo. I hope this was like an interesting thing for all of you. And so thank you for spending time attending this session. Please enjoy the rest of the KVM forum. Thank you.