 Hello everyone, welcome to the presentation on PCI endpoint drivers in Linux kernel and how to write one My name is Manimaran Sadasivam. I work as a senior kernel engineer at the Qualcomm landing team of Linauro. I primarily do open-source contributions to Linux kernel Uboot and Zephyr and as a part of my job at Linauro I work on upstreaming the Linux kernel support for Qualcomm SOCs. I do all of this work while living in Tamil Nadu, the southernmost state of India. So this is the agenda for today's talk. I'm going to show an architectural overview of PCI subsystem and PCI endpoint devices. Then I'll talk about the internals of PCI endpoint framework in Linux kernel and how to write drivers for PCI endpoint controller and PCI endpoint function. Then I'll show how to use the PCI endpoint framework in Linux kernel and Finally, I'll share some of the pain points in using the endpoint framework in a real product like PCI based modems. So this is the overview of PCI subsystem where we have the root complex connected to the CPU and memory and inside the root complex we have the host bridge connected to the PCI to PCI bridges and This PCI to PCI bridge lives under the PCI bus zero and the PCI endpoint devices connected to the downstream port of the bridge lives under bus one. So in this talk, I'm going to show how to use Linux kernel inside the PCI endpoint devices. So this is the overview of PCI endpoint devices in Linux kernel. So we have the PCI endpoint controller talking to the PCI bus and it is responsible for handling all of the PCI bus related activities like TLP generation, link management, etc. So this endpoint controller communicates to the endpoint framework inside Linux kernel and this framework communicates to the endpoint function inside the Linux kernel. So finally, the endpoint function read and writes the endpoint memory. But at the same time, once the PCI endpoint controller is configured by the framework, it can also read and write to the memory endpoint memory without the intervention of the CPU by using mechanisms such as DMA. So this is the high level overview of PCI endpoint framework in Linux kernel. So the PCI endpoint framework primarily depends on two device drivers for its operation. So the first one is PCI endpoint controller driver and then the other one is PCI endpoint function driver. So this controller driver is responsible for managing the endpoint controller and which will in turn talk to the PCI bus. And this PCI endpoint function driver is the one implementing the actual use case of the PCI endpoint device. For instance, let's say if you have the PCI based NVME device and this endpoint function is the one talking to the flash storages and then reading and writing data to it and then communicating to the PCI host over the PCI endpoint framework. And then we have the user space component based on config FS file system and that's used for configuring the PCI endpoint framework. So the PCI endpoint framework has four major components. So let's look into all of these components in detail. So the first component is PCI EPC core and it manages the interaction between the endpoint function driver and then the endpoint controller driver. Its primary job is to pass the PCI endpoint events from endpoint controller to endpoint function driver. So currently there are two types of events are supported by the endpoint framework. So the first one is core in it event and then the other one is link up event. So this component also manages the creation and deletion of the endpoint controller drivers. And so once the endpoint controller driver gets registered with the framework, it will be visible under sys class PCI EPC and EPC. So the EPC is the name of the endpoint controller driver that gets registered with the framework using this component. So the second component is PCI EPC mem. So it manages the memory space used by the PCI endpoint function. So its primary job is to allocate memory from the address space region specified in the PCI endpoint controller device to node. So the PCI endpoint function allocates memory from this address space region by using these two, by using the mem log API and then freeze the memory using mem free API. So the memory allocated from the address space region can be used for mapping the PCI host address space in the PCI endpoint device. So one of the primary job of a PCI endpoint device is to map the PCI host memory and then read and write to it. And this component is primarily used for that purpose. So once the memory is allocated, we need an external entity like IATU the internal address translation unit available in the synopsis based design where PCI controllers for mapping the host memory into the endpoint. So primarily the mapping of PCI host memory is used for several purposes. So one of the one of the purpose is to generate the message signal interrupt the PCI host. So instead of using a dedicated IATU line for generating the interrupts, we can use the message based interrupt signal mechanism for signaling the IATU events to the PCI host. And the one more use case is to read and write arbitrary PCI host memory locations in the endpoint device. So the third component is PCI EPF core and it is used for managing the interaction between config FS file system under the endpoint function drivers. So this component is used for creation and deletion of the endpoint function drivers within the endpoint framework. And the endpoint framework needs to register themselves within the framework using the EPF register driver API and then unregister using EPF unregistered driver API. And similarly for binding and unbinding, it needs to use the EPF binder API and EPF unbind APIs. So the other use case of this component is to allocate memory in the PCI endpoint bar region for the endpoint function drivers. So in a typical PCI endpoint device that will be registers that will be registers exposed in the PCI bar region to the PCI host. And this component is used for allocating the memory in the PCI in that in a specific PCI bar region for the endpoint drivers so that the endpoint drivers can simulate the register interface in the bar region so that the PCI host can read and write to it. And for that purpose it exposes API such as EPF alloc space and EPF free space. So the final component is PCI EP CFS. So this component manages the interaction with the user space config FS file system thus by enabling the user space to configure the endpoint. So the user space interaction through this component includes creation and deletion of endpoint functions for the endpoint function drivers. So whenever the endpoint function driver gets registered with the framework, we need to create an endpoint function for that function driver for controlling that driver from user space. Then the EPF drivers the function drivers needs to bind with the EPC devices. So in kernel there is no real link between the endpoint function driver and then the endpoint controller driver. So that link has to come from the user space through the config FS file system. And finally once everything is set up, the user space can start and start the endpoint controller using the config FS file system. So we have looked into the internal components of the endpoint framework and we'll see how to write a device driver for the PCI endpoint controller. So the first thing is to initialize the endpoint controller. And for that we need to initialize resources such as clocks, reset file and GPIO. And usually these resources are obtained from the platform specific data such as device tree or ACPI. And after initializing the resources, the memory regions needs to be initialized. So there are several regions supported by the framework. First one is the DBI which is transfer direct bus interface. And this is a synopsis design of our specific memory region. And there is a memory region called address space or MIM. It is used for mapping the PCI host memory in the endpoint devices. So the endpoint controller driver has to control devices has to define the address space region. So that the endpoint framework can use this region for mapping the PCI host memory in the endpoint. And then there is one more region called ATU that's used for internal address translation unit. And this is also specific to a synopsis design var. And if the endpoint device requires DMA, then it also needs to obtain the DMA memory region from the platform data. So after obtaining all the memory regions, the endpoint controller needs to get configured. So the first thing to do is configure the controller in the endpoint mode. Nowadays, most of the PCI controllers support RC mode and then the endpoint mode. So for this use case, we need to set the PCI controller in the endpoint mode. And then we need to set the link speed and then the lane count and configure the L1 on L1 as its timings. And finally start the LTSSM state machine. This starts the actual communication with the PCI host over PCI bus. So the PCI endpoint framework supports two different notifiers. And the notifiers are primarily used for communicating the events from the PCI endpoint controller to the PCI endpoint framework. And the first event is the core init notifier. So this core init notifier is only used on the endpoint device if the endpoint controller depends on an active reference clock from the host. So the reference clock can either come from the host or it could come separately. But if the reference clock comes from the host, then the endpoint controller cannot be initialized before getting the reference clock. So for that purpose, the core init notifier is used primarily within the framework. So in those cases, the core init notifier flag should be set in the PCI EPC feature struct. And during runtime, once the reference clock becomes active, the endpoint driver needs to initialize the endpoint controller and call the DWPCI EPC init notifier API to notify the endpoint function driver that the controller has completed the initialization. If the notifier is not supported, then the controller driver is free to do all the initialization during the probe time itself. And then the second type of notifier is the linkup notifier. And it is used for signaling the linkup events from the endpoint controller to the endpoint framework. So for making use of the linkup notifier, the driver has to set the linkup notifier flag available in the PCI EPC feature struct. And once the linkup event is received by the controller driver, it needs to call the DWPCI EPC linkup API to notify the endpoint function about this event. So if the endpoint controller is making use of the DMA engine for offloading the PCI read and writes, then the DMA engine also needs to be initialized in the device driver. So the first thing is to set up read and write channels and mostly these channels are defined in the platform data such as DTN ACPI. And the driver has to request DMA IRQs for each channel and then it needs to allocate and configure the link list for managing the DMA transactions and then finally configure the DMA controller. So the next step is to allocate memory for the message signal interrupts. As I said before, one of the primary requirement of a PCI endpoint device is to signal the interrupts to the PCI host and for that purpose, the PCI message signal interrupts are used. So the endpoint framework exposes two APIs for initializing and then the allocating memory for MSI. So the EPC mem init API is used for initializing the MSI memory and then the mem alloc address API is used for allocating the memory for MSI from the address space region specified in the endpoint controller device denote or even it can come from ACPI. So this API returns the virtual address of the allocated memory and then the physical address also. So the next thing is setting up the PCI host memory mapping in the endpoint device. So once the memory for the MSI gets allocated, it needs to be mapped so that the rights to the MSI address space can trigger the MSI interrupt to the PCI host. And for that, the endpoint device has to use an external entity like IATU for mapping the PCI host memory in the PCI endpoint device. So in addition to MSI, the PCI endpoint device may also need to access arbitrary locations of the PCI host memory. So for that purpose, the driver has to set up mapping windows to be used during runtime and it also needs to set up the memory alignment and then the limit for each window. And then the next thing is to enable the endpoint IACUs. So the endpoint PCI endpoint device can make use of the PCI PIRST event defined in the PCI spec. So this PIRST event can be used to signal the clock and then the power ready event from the PCI host to the PCI endpoint. So the PCI endpoint driver needs to assign an interrupt to the PIRST IRQ so that it can catch that IRQ and then use that for preparing the endpoint framework. Usually the PIRST IRQ is served using a sideband GPIO. That's a special purpose GPIO defined by the PCI spec. And if the endpoint controller supports any other controller specific IRQ events such as link or controller specific events, then it also needs to be allocated and then assigned in the driver itself. So the final stage is to register the endpoint controller driver in the endpoint framework. And for that purpose, the PCI EPC create API can be used. So with this API, the controller driver needs to pass PCI EPC OB structure with the function pointers defined by the PCI endpoint framework. And these function pointers will be used by the framework during runtime for different purposes. For instance, the right header function pointer is used for writing to the PCI configuration space of the PCI endpoint device. And then there are function pointers such as set bar and clear bar that's used for configuring the bar regions of the endpoint. And then there are callbacks for function pointers for setting and getting the MSI counts from the endpoint device. And there are function pointers for starting and stopping the controller device. Okay, so we have seen how to use, how to write the PCI endpoint controller driver in Linux Kernel. And we are going to now see how to write the PCI endpoint function driver in Linux Kernel. As I said before, this PCI endpoint function driver is the one doing most of the actual use case of the PCI endpoint device like reading and writing to the flash storage in the case of NVMe or talking to the NVMe. The modem DSP in the case of PCI based modems. So the first step in writing the device driver for EPF is to register the driver with the endpoint framework. For that, the EPF register driver API can be used. And with this API, the driver needs to pass the EPI PCI EPF driver struct and that has function pointers for probe and demo. And these function pointers will be called by the endpoint framework whenever this driver gets registered to the framework and then removed from the framework. And then the PCI function driver needs to also pass in one more structure called PCI EPF opes. And this op structure defines a set of function pointers called bind unbind and add cfs. So the bind and unbind pointers are called whenever the binding operation happens between the endpoint device and then the endpoint function driver. And then the add cfs function pointer will be called whenever the function driver needs to initialize as function-specific config fs attributes in addition to the ones that's already exposed by the endpoint framework. And finally, the endpoint function driver needs to also populate the PCI EPF device ID for uniquely identifying the function driver. And it has, it needs to provide the name for the PCI EPF driver and then it can also optionally pass in a driver data into the probe and then the remove functions. So once the endpoint function driver is registered successfully within the framework, then it needs to service the notifications from the framework during runtime. So as we, as we saw before, the coordinate notifier, once the coordinate notifier is supported, then the endpoint function driver has to implement support for catching the event from the endpoint framework and then acting upon it. So let's say if the coordinate notifier is supported, then on the occurrence of this event, the endpoint function driver has to write to the PCI endpoint configuration space using the right header callback. And then it should set the bar region using APC set bar callback and then it should also set the MSI and MSIX using the set MSI and set MSIX callbacks. And additionally, if the link up notifier event is also supported, then on the occurrence of the event, the endpoint function driver has to request the DMA channels if it is using DMA, then it needs to start the actual function of the endpoint function driver. Okay, so we have seen the overview of the PCI endpoint framework in Linux kernel and how to write the drivers for the endpoint controller and then the endpoint function. Now we'll see how to use the endpoint framework in Linux kernel. So the first step is to boot the PCI host and then the PCI endpoint devices and then load the PCI endpoint controller driver and then the function driver. Then once the user space is activated, then mount the config of FS file system so that we can configure the endpoint framework from user space. So the first step of the configuration is to create the endpoint function device. And then the second step is to link, to actually create a link between the function driver and the controller driver. And this can be done by creating a soft link in the endpoint, in the config controllers directory under EPC. So the link has to be, the link for the endpoint function driver has to be created under the endpoint controller directory in config FS. Then once the linkage has been done, the controller can be activated by writing one to the start file under controllers EPC start. And finally, once the operations are done, the controller can be stopped by writing zero to controllers EPC start. So as I said before, the PCI endpoint framework serves its purpose, but there are also plain points while using the endpoint framework in a real product. So the first problem is the probe of the PCI endpoint controller driver really depends on the active reference clock from the host, if the case of core init notifier. So if the device requires an active reference clock from the host, then the controller needs to wait for it, but the probe of the endpoint controller driver need not wait. But in the current situation, the probe of the driver also has to wait. And that is an active discussion going in the mailing list to fix this issue. And currently there is no way to configure the endpoint framework in kernel without config FS. So that is always user space comes into picture. And if you have a use case where the kernel itself needs to configure the PCI endpoint framework, then it cannot be done now. And currently there is no diversity integration for the endpoint framework. And this is acceptable at some point because the endpoint framework or the endpoint function is really a software implementation inside the kernel. But in the future, there may be PCI endpoint function drivers depending upon the actual hardware blocks for carrying out its duties. And in those cases, we need to have a device to representation for the endpoint framework to actually get the resources for the function from the device tree. And finally, the use of notifier mechanism inside the framework forces atomic context in the endpoint function drivers also. So even though the caller of the notification, the endpoint controller might not be in the atomic context, but the endpoint function driver that's actually implementing the, callback function in the APF driver needs to be in the atomic context. And this is primarily due to the use of atomic notified chain in the framework. And for fixing this issue also, there is a discussion going on in the mailing list. So sooner or later, we'll have all of these pain points addressed in the framework itself. So that's it for the presentation. Hope you have enjoyed it. And if you have any feedback, please reach out to me over email or IRC. Thank you.