 Hello, guys. Myself is Vandana. I've been working with Linux kernel and device drivers like almost like 20 years. I had worked on device drive development for embedded systems when it was part of NVIDIA subsystem and various kernel subsystems for security memory management as such. And I also involved in taking the Linux trainings for device drivers and Linux kernel internals as in when I get time as such. And today I'm going to talk about the subsystems that is part of the kernel that is used to provide the interface to the user space for writing the drive development at the user space. And these are the framework that is available in the kernel. That is the UI framework and BFI framework. So the agenda is to understand and know what is this UI framework, how does it work, and how does the user space applications can make use of this, as well as once the UI is done, then we'll go and look it into the VFIO interface, which is the virtual function IO interface, which has particularly been useful in case of virtualized environment. And then look at the usage when we are using the VFIO driver interface. So to begin with, let's start with the UI driver interface. So it's basically a user space IO interface. And using this interface, it helps us to write the majority of the part of the driver functionality in the user space. So basically, when we say that the applications want to interact with the hardware, so the corresponding functionality has to be supported as part of a driver in the kernel space as such. But there are need in such a way. It has its own benefits, as well as some disadvantages, having all the processing in the kernel address space. So the alternative solution is to implement the user space and UIO driver framework is one of that interface. So it helps to write most of the part of the driver functionality in the user space. And a very poor or very small part has to be implemented into the kernel space as such. And this UIO interface, it makes use of the driver character device driver interface and C-SFS interface to interact with the user space applications. And it has been available since a long time, since 2.623 kernel as such. And the basic functionality it provides is the device access, interrupt handling, and memory allocation. So as we see that the UIO allows to implement the driver functionality in the driver, but there is still some part that has to be implemented in the kernel space. And basically, which is to set up that device and register the interrupt handler, so that this interrupt handler processing would allow to pass the interrupts to the user space applications. So as we said that the UIO driver framework provides the character device interface and it provides in the form of this device file entries called dev slash UIO 0. Like for the number of devices that would be programmed through UIO interface that will start with first device as UIO 0, then UIO 1, and so on as the subsequent number of the devices. And this is the device entry that has been used to access the address space of the device, means the communication with the device will go through this device entry as such. And like as each device might have one or more memory regions, memory regions providing the path for configuration as well as for data transfer. So all this memory, the memory regions can be mapped to the user space through the MAP interface. And the application can access the memory regions once that memory is been mapped. And again, the memory mapping and the information that has been provided to the user space through the CISFS entry. So most of the other information device from the application that would be presented through this slash slash class UIO directories. Like if there is a single instance then you'll have the directory UIO 0. And if there are multiple device instances then each will have UIO 0.1.2 and corresponding attribute entries into that CISFS directory structure. So that's what we see that can provide a read write through the CISFS and this is the directory path in the CISFS file system. Okay, so like when we talk of a driver development of the basic functionality the driver in the kernel space that provides is the access to the device. That is the device memory. It might be a register memory or it might be a data memory. And along with that there are other interface that as a kernel driver need to take care is the responses or notification from the device which comes in form of the interrupts. So that particular handling in the UIO driver that would be take that interrupt handler has to be registered with the kernel part of the UIO driver. And from the user space point of view the application would make use of a blocking read system call onto that device entry to collect the information of the interrupt and it might also go for a select system call to wait for an interrupt. And also like for devices that do not generate interrupts and might make use of this polling mechanism and that functionality can be implemented by possibly by setting up the timer handler and then triggering the interrupt triggering the interrupts through a configurable time intervals as such. And so this is just the API that is used inside the kernel part of the code for sending the interrupts to the user space applications. So these are the two interfaces that we have as part of the, as the user space application can access the UIO interface. That is one is a device access that goes to the slide of UIO device file entries. And this applications can, application can make use of read system calls to get the information of the interrupts and for data transfer the memory regions would be mapped by making use of the MAP system call. And then the basic data transfer can happen through the read write system calls as such. And all this information as we said that it would be available through the CISFS attributes a corresponding, what are the number of corresponding devices that are working under the UI interface. So the standard interface that is provided information is the name of the device, versioning and the event which tells the number of events that have been handled by the driver events as in the number of the interrupt as such. And when the memory mapping is being done for every memory mapping of the memory region the corresponding maps dot maps slash map acts directory would be created in the CISFS entry which would contain the information of the memory regions that are being mapped. That is what is the address that has been mapped and how, what is the size of that memory region as such. And these are the things that has to be, that is taken care in the current part of the UIO driver that is basically it makes use of the data structure that is UI in for any of IMM that stores all the information corresponding to that device like the number of memory regions it has and all that information. These are the standard APIs that are used to register the UIO device with the kernel. And when the UIO device goes like unregistered as we've seen that even notifies the API that is used to notify the interrupts to the user space application which might be waiting on a read or a select system call as such. Any questions so far? So this is to summarize that this is the user space interface that helps you to write the drivers in the user space. And one of the benefit of the user space drivers is that it provides the higher performance and reduces the latency where by the data transfers directly happening to the memory that is mapped into the user space. Some of the limitation of that is it has reduced support for the interrupt handling in case like MSI interrupts, PCI, MSI interrupts those are not particularly well handling the UIO interface. And also like it needs root privilege to access this slash dev slash UIO as such. So now when we talk about the virtualization or virtualize environment where there is like lot of applications that might, application like the virtual machine itself want to access the device directly as such. So in that case, one way is that it can use UIO but it has its own limitations. So there's another framework that is part of the Linux kernel called as a VFIO that is a virtual function IO which has been used. Basically it also provides the reason for that is that it is used for higher performance there. The VMs can directly go and access the device as such. And it provides the support for MSI interrupts as well as it doesn't need the user root privilege as such. So the particular device can be assigned to that virtual machine and the virtual machine can directly access the device through this VFIO interface. So this VFIO basically provides the device agnostic framework for exposing the devices directly to the user space in a secure and IMMU protected environment. I will see this, how IMMU can be used along with this framework as such, okay. So what does this VFIO driver framework provides is that it provides the full device access DMA data transfer support as well as interrupt handling support. And when you say DMA, it's like the read, write, read, write and map through the memory mapping support of the device resources. And event handling goes through the event FD mechanism as such. And it also provides the support for IMMU APIs as such and the bus device means the devices can be PCI device or platform device support has been added and basically this framework, we have a framework which is very much commonly been used in a DPDK as well as NVME stack which provides the storage stack and the networking stack in the user space. So this is the diagram that I've just taken from the Alex Williams presentation as done sometime back in 2011 or 12 I think. So this to help you understand in the virtualized environment, the user space application that is a chemo, how does it makes use of this VFIO interface to access the devices. So basically VFIO decomposes the physical device as set of user space APIs. And then again at the other end, it recomposes that physical device to the virtual device in chemo. So to understand this whole VFIO concept VFIO framework, basically it consists of, there are three concepts that in the VFIO that is the devices or groups and the containers. So basically device is that represents the particular device entity as such that it creates a programming interface that is made up of different interfaces to provide the IO access, provide the interrupt support and the DMS support. And the user space can utilize this interface to get the device information as well as configure the device as such. So at times that the systems cannot be identified individually as such. So in that case, or it can be that a number of devices can be grouped together as a one isolatable engine, one isolatable group as such. In that case, the groups comes into the picture where it combines the set of devices which are isolatable from other devices in the system. With a set of devices that are to be configured in a particular VM as such, those can be grouped together using the groups concept. Basically the support is provided through the driver, through the kernel subsystem in the form of IUCTL functions to program the devices in a particular group as such. And when a number of groups has to be combined together, it can be done that all the set of groups can be combined to form a container as such. So this is basically when used to provide isolation from the isolation of the devices from the other separate environment. So this is just to give you an overall idea how the user space, the chemo can communicate with the VFIO framework when it is strong into a different device interface. The device might be a PCI device or it might be an IMME unit altogether. And then these are the set of the APIs that it might be a container, API group, API, and device APIs. Those comes in form of IUCTL commands into the VFIO framework, and then it diverts particular to the device as such. Okay, so to look at how this VFIO works, what we try to look at is what are the basic functionality that is needed when we talk to the device, when the application talks to the device, like how does the driver programs the device, then how does the device response back, or signal backs the notification to the driver, and the actual data transfer happens. Again, the interface that has been used by the VFIO framework is again a driver device file interface as such that is a VFIO, dev VFIO, VFIO. And this interface is used to basically manage the whole to perform the operation on this VFIO framework. So the first thing is that how to program the device. Now let's take an example of PCI device and see how does the driver programs this PCI device. So as we are aware that each PCI device is accessible through the PCI, its own PCI configuration space. And through the PCI configuration space, the driver gets to know that what are the different memory regions it is providing, and the other information as such. So basically, this is just to give a little more details about the PCI config space as such, that is this is output from a LSPCI command, taking particular example of VJ controller as such. So if you look at the information, it is providing the information, particularly the memory regions as such. Like here we see that there are four memory regions that are part of that device, like region zero is the Iobot region that is starting at address D010 and 60 bytes, whereas there are other two memory regions starting memory one, region one and region two, that those are the memory MMIO regions as such, and it also providing the expansion ROM region as such. So these are the regions that the PCI device provide through which the communication with the device happens. So basically when we think of a PCI driver in the kernel, so in the kernel driver, these regions they are IO mapped or mapped into the virtual address space. So when we are using VFIO, these regions are extracted through the VFIO interface and then they are mapped into the user space memory as such. And this information is again provided to the user space, like each of these regions, like as we say that each device will be exposed as a device file, slide device file, and the different regions they will be mapped to the different file offsets of that device as such. So to understand better, we will take an example of VFIO PCI driver as such. So this driver resides in the driver VFIO PCI directory in the kernel sources, and basically the whole of the functionality are been provided by populating this VFIO PCIO operation data structure, which helps us to implement the number of functionalities for that device file as such. So that helps us to give the, that provides the device information. And this information can be extracted from the user space through the IO CTL interface as such. And the information can be get the overall information of the device, like how many memory regions it has, what is its interrupt number, and all that information. So yeah, that's what we talked like. The application would interact with the device to, for getting the information and for configuring some of the settings. There are the NM of IO CTL commands that are been implemented into this framework. And here I would go through some of them, the basic ones as such to get an understanding of how this interface works as such. So the device properties, they are being discovered through the IO CTLs by the user space applications. So to get the device information, we have this IO CTL command, get device info. Then to get the information of the various regions, that is memory regions, and option ROM, if it is option ROM is provided and all that information that can be extracted through this, get region info. And the information of the interrupts, you have the IRQ interrupt, as such, how many number of interrupts are supported, and the properties of the interrupts as such. Just to get in bit of detail of each of that IO CTL, like when the application wants to know the information of the device, it would trigger this get device info IO CTL. And this information is passed through this data structure called as VFIO device info. So here we'll see that first, we'll see the basic things that is number of the regions that is part of that particular device, then the number of IRQs supported as such. And along with that, it has these flags variable that provides the details of the information as this platform device or PCI device as such and the capabilities. Similarly, to get the, now once the application gets to know that how many number of regions are there available, then the information of each of this region can be extracted by this another IO CTL command to get the information that is get region info. And in this get region info, the parameters that as part of the region index, then the size of that region and the offset as such. So now you're the region, what when we're talking about the PCI device as such. So in this case, the information that can be available to the application is it can be a PCI configuration or it can be option ROM region and it can be any one of the bar regions like the PCI supports the six bar regions that has been indexed by bar zero to bar five as such. So application will be able to get the information that what is the size and what is the offset of that provided the index region. And this get IRQ info would provide the information about the interrupt as such which would be used by the application to further pull for the interrupt as such. Okay, that is it gives the number of interrupts and the different flags tells the properties of the interrupt whether it is maskable or unmaskable or auto mask is like automatically masked by the via file driver. So based on this requirement, it can check the conditions as such. So these are the simple straight forward IOC tail commands through which the application would get the information or the device. So now let's say like how the interrupts are being handled in the user space. So the interrupts, the user space make use of this event FD mechanism to notify event FD mechanism that is to the user space application for the events. And basically it returns the FD return can be used to what you say the application can do read or pull or select to collect the events that are the interrupts that have been generated. It means from the device it goes to the via file framework and the framework through the event mechanism it passes the interrupts to the user space. So we have seen like how the device can be programmed and how the information of all the device can be extracted or get available to the user space directly without going into the kernel space as such. Then one of the major part of the device programming or this handling is the data transfer as such as a standard mechanism that is always a DMA access that is there that provides the interface to do the read and write on to the device memory system memory or might be a pure device memory. And also it provides this support through IOMMU that is memory management unit for IO. So basically the IOMMU, the basic functionality IOMU was to do the translation of IO address space. And along with that, it also provides the isolation device access isolation as such. Like why we are talking about that is like the DMA imposes a risk to the overall system integrity like it can allow device read and write access to happen to the system memory as such. So to mitigate that this risk can be mitigated with the help of the IOMMU that is that it by providing the isolation devices can be isolated from each other and from the arbitrary memory access as such, which the support is not present in the UI interface that we saw previously. And this device isolation is helpful in case of virtualized environment. In the particular VM, the particular set of devices and its corresponding memory addresses will be isolated from the other set of devices and the memory regions as such. So one of the issues, some of the issue of the IOMMU is that it always does not identify the, not always at the granularity of a single device as such. So in that case, in that case, this problem can be solved by making use of IOMMU groups. So that's why where the set of devices can be put together in a group, which is identified by the IOMMU and that group becomes isolated entity altogether as such. So VFIO is built on the ability to isolate the device using the IOMMU, which makes the DMA access secure as such. So how does a device comes under the control of the VFIO interface? So let's say that we want to make a particular device given access to happen through the VFIO interface. So you have just taken an example, like a PCI device. This particular device is available through LSPCI. So when the system boots up, the PCI subsystem scans the PCI devices and based on this configuration, it loads the PCI drivers for all those devices that are present. So when we want to put that particular device in the VFIO control, what we have to do is we have to unbind that device from the host driver as such by going through the CISFS entries as such, first unbind the device and then have to bind the same device to the VFIO driver. So here we are talking PCI device. This device will be binded to the VFIO PCI driver. And once it is binded, then through those interface, it can be accessible. And like the device is going through the IMME, then if it has multiple devices, then they have to be bound. Each of the device has to be bounded separately to the VFIO driver as such. So this is just an example showing that in a group that is a single device, there is another group that has two devices together. This 42 and 23, it is the number, group number as such. So as we have said that IMME group is the preferred granularity to ensure a secure access for the N number of devices in that particular group. But again, then there are some in IMMEs which make use of page tables and which might want to share these pages between different groups as such. That in that case, like in that case, those that can be achieved by putting those number of groups onto another grouping them, all those groups together to form a container. That is VFIO makes use of container class to hold one or more groups as such. And basically why would we need to do that as like if the page tables are being shared, that would help to reduce the overall overhand of the system by reducing TLB thrashing and duplicate page tables or ultimately impacting the IO performance as such. So that can be achieved by creating the containers. So the VFI framework provides the set of API, set of IO Ctl calls to do the programming of this, like how the groups can become combined together to create a container as such. And this is one of the API, one of the IO Ctl command that is used to create the container and add the groups into the container. Okay, so this is to give a brief overview of the number of IO Ctls that are present that works on the group and the container as such. And also setting the IO MMU information and as well as working with the mapping and unmapping of the DMR regions or memory regions all together. So at the group, the container, we have the set container in getting the information of the container working with IO MMU times as such. And mapping and unmapping by making use of this DMA map and unmapped IO Ctls. So basically I right now don't have much access to my hardware interface set up as such. That's why I've not been able to show you the demo as such. So I would just refer how the VFI usage can be done. So basically when we're using, taking some PCI device again, it's like first unbind the device from the host driver, then load the VFI driver. That is a VFI PCI driver. And once that is available, then bind the device to the VFI PCI driver interface. Are those things that goes through the Ccfs device entries as such. So that comes through the end of this brief introduction of VIO driver framework and VFI driver framework as such. Any questions? You have some virtual questions, if you don't mind me reading them off. Yeah. Okay, what particular end user use cases slash workloads have benefits from the VFIO and UIO framework? Any performance improvement comparing with traditional techniques? So yeah, so VFIO or UIO, it provides a direct access from the user space itself. So one of the things is that it eliminates the need of popping the data from kernel space to add user space as to it. Whereas whenever the DMA is done, DMA is done directly to the user buffers altogether. So that enhances the IO performances as well as reduces the latency of handling of data transfer as such. Okay, and I think there's a follow up. What are the known drawbacks of the framework? One of the, like when you talk of the UIO, the drawback is that it has some limitation on the interrupt handling. And the other thing is that it needs a root privilege as such. So those are the things. And those to eliminate those limitations, the VFIO interface has been added, which provides one of the, what do you say, problem with this security, secure access as there. So when the VFIO is used, goes through with the IOMMU, then that security access aspect has been handled by isolating the devices and its regions altogether. Okay, I have one more on here as well. To the end user use cases, one such example is DPDK and related projects, SPDK, RPDK, et cetera. And these projects user space access to the underlying device is necessary to reach performance numbers in order of magnitude faster than going through the kernel. DPDK is focused on user space access to the NICs for faster packet processing. Do you have anything to comment on that? Yeah, basically DPDK uses the VFIO interface altogether to get the, to memory map those memory regions into the user space. And once that memory has been available, interrupt settings has been available, then the DPDK stack basically programs the data flow through the memory map that is done by the VFIO. Okay, we have two more on here. Do you feel up for answering them? Yeah. Okay. How are IOMMU groups typically assigned? Is this purely platform dependent or can the user make changes to which devices are in which groups? So yes, the user can make a user through this IO-suitable system call can create the groups and then assign the devices into that group. Okay. And the last one. Is there any relationship between VFIO and Virtuo? Sorry, can you repeat? Is there any relationship between VFIO and VIRTIO? Yeah, so what IO is like, that's the interface that provides the virtual, what IO is used to, again, in the virtual environment where it will provide the virtual device interfaces as such and then virtual devices itself and this virtual devices will be binding and made available through the VFIO. So VIRTIO is a physical entity to represent the virtual devices in the virtualized environment and then VFI will be on top of that exposing that virtual devices to the applications. Okay, wonderful. That's it from Virtuo. Okay. Any more questions from you guys? Okay, thank you guys.