 Hello, everyone, my name is Priyavikshit. I work at Samsung Semiconductor in JRND Santa. And today I will present the topic, Understanding Linux interrupt subsystem. In this presentation, we will be covering a lot of things like what are interrupts and how are they triggered, various types of interrupts, then what are interrupt controller and their significance, generic interrupt handling. Here we will check how interrupts are handled, regardless of the interrupt controller used. The main part of the presentation is understanding the generic interrupt subsystem framework. Under this, we will see a few important data structures used and how are they linked to each other. Generic interrupt handling process, we will then see high level driver APIs and see how different driver can call these APIs without knowing anything about the interrupt details. This way, high level APIs can be used on different platforms without the code changes. After that, we will be checking the currently running interrupt from the user space and then the various configurations related to debugging interrupts. Why are interrupts needed? Many times hardware device may need to communicate to the CPU. Communications might be needed for the cases like some error got generated or say a hardware device has some request. So for CPU to process these hardware device requests, we can use two methods. First is polling method and the other is interrupt method. In polling method, CPU keeps on checking if any device has any request, so this decrease the throughput of the system. CPU is continuously in the loop check for the hardware device irrespective of if the device wants any communication or not. Else the second method to use is interrupt method. This is a preferred method where CPU takes care of hardware only when hardware device has some request. So in interrupt mode, the microprocessor will execute its main program and stop only when hardware device interrupt the CPU. Linux handles the interrupt in the same way it handles signals in the user space. Generally speaking, an interrupt is physically produced electronic signal originating from hardware device and directed into the processor. The processor detects the signal, interrupts its current execution and call the special routine that service the interrupt. Once the servicing is completed, the processor would resume exactly where it left off. This event that calls the interruption is called interrupt and the special routine executed to service the interrupt is called interrupt service routine. Using interrupts the overall system throughput can be increased that is why this is the preferred method. Interrupts can be of many different types based on how we classify them. Let us start with hardware and software interrupts. The interrupts caused by external signals are referred as hardware interrupts and the interrupts caused by special instructions are called software interrupts. Timer ticks, network card and keyboard are example of hardware interrupt. While the most common example of software interrupts would be page fault, overflow instruction, division by zero, et cetera. Then they are maskable and non-maskable interrupts. Simply the interrupts which can be ignored are maskable and the interrupts which cannot be ignored are non-maskable interrupts. Usually the interrupts which cannot be delayed or should be processed by the processor immediately are non-maskable. This could be a critical event such as hardware failure. On the other hand, maskable interrupts can be delayed when a much higher priority interrupt occur and this cause the other interrupts in a weight state. All the interrupts request issued by input-output device give rise to the maskable interrupts. Let us see a few more types, shared interrupts. Since there are limited interrupt lines available on every SOC, this was the basic need for shared interrupt. Ideally, one IQ line can serve one device as we saw in the previous diagram but this is not enough in few cases. As a solution, modern hardware has been designed to allow sharing an interrupt line among a couple of devices. So when two devices or two or more devices use the same interrupt line or the same IRQ number it is called shared interrupt. A spurious interrupt is a signal of very short duration on one of the interrupt line. It is slightly maybe caused due to signal glitch. We can say spurious interrupts are false interrupts or bad interrupts. There is a separate file spurious.c in Linux kernel for handling these kinds of interrupts. The ARM specific GIC that is generic interrupt controller has different type of interrupt sources like PPI, SPI, SGI. We will be discussing about interrupt controllers in the upcoming slide. So for the timing, let's discuss its classification. Private peripheral interrupt, PPI. These peripheral interrupts are private to one core and cannot be executed on other core. An example of PPI would be a generic timer for each of the core. Then shared peripheral interrupts, SPI. Paripheral interrupts that could be, that cannot be, like there is no restriction and that could be delivered to any of the core is shared peripheral interrupts. Means that is shared among the different cores. Then there is software generated interrupt, SGI. SGI are typically used for inter-process communication and these are generated by writing to the specific SGI register in GIC. Each interrupt signal input is designed to be either a signal level or signal edge. The level sensitive input continuously requests processor service as long as particular level, high or low, is applied to the input. As sensitive input reacts to signal edges, particularly whether rising or falling. So it is present for a comparatively shorter duration. We can say a level triggered signifies the state while edge triggered signifies the event. On assigning a trigger level for interrupts, we may keep few things in mind. Like if the interrupt is configured as level triggered, it is impossible to lose them, but we need extra care of clearing them. For edge triggered, you get an interrupt when the line changes from inactive to active state, but only once. We can also say like level triggered interrupt require feedback to interrupt source in order to reset it. This feedback may be some sort of acknowledgement or disabling the interrupt line for timing, anything will work while edge triggered doesn't need it because it is just for a pulse. But this could also be seen as a drawback of edge triggered because hardware generated glitches, we can say the spurious interrupts may get edge triggered interrupt falsely asserted. With these level triggered interrupt, the interrupt handler has to pull each of the device which are sharing the interrupt line to determine which device actually calls the interrupt. So interrupt sharing is also the key difference between edge triggered and the level triggered. While the level triggered interrupts can be shared, the edge triggered interrupt should not be. Interrupt controller, interrupt controller's main function is to manage interrupts coming from different hardware devices. Interrupt lines are often limited, so the interrupt controller multiplexes several possible input sources on the platform and send one high priority interrupt to the processor at a time. It does so based on types of interrupts, like if it is non-maskable, interrupt it is given a higher priority and similarly for different maskable interrupts, we can assign different priorities. Also in the interrupt controller, we can mask some of the interrupts that do not occur at a particular time. We can set different priorities for different interrupts such that the higher priority is serviced and then we take the next lower priority interrupt. For the multi-core system, we can set interrupt affinity as well, which means the particular interrupt with affinity to code zero will run on code zero only. As we earlier saw what are shared interrupts, so using interrupt controller, we can assign a common interrupt line for multiple hardware devices as well. Interrupt controller also calls reduction in interrupt latency. This can be done by automatically saving and restoring the register contents, which results in a reduced the delay. There are various type of interrupt controllers available like a VIC that is Vactored Interrupt Controller, PIC is Programmable Interrupt Controller, and VIC is Nested Vactored Interrupt Controller to name a few. In the upcoming slides, we have chosen GIC, that is Generic Interrupt Controller from ARM as an example. Till now we have covered what are interrupts, their types, how are they triggered and the need for interrupt controller. We will now see various interrupt controllers use generic interrupt framework. The generic interrupt handling layer is designed to provide complete abstraction of interrupt handling for the device drivers. A device driver use generic APIs to request, to enable, disable and free IRQs. These driver do not know anything about the interrupt hardware details, so they can be used on different platforms without any code changes. In the upcoming slide, we will see a few important structures that are used for generic IRQ subsystem implementation and the relationship with each other. How these structures are being used in generic interrupt APIs. We will also cover this. So the bottom line is to understand how the interrupts are being served irrespective of what interrupt controller is being used. The important data structures used in generic interrupt framework are IRQ domain, IRQ desk, IRQ data, IRQ chip. These structures in some way are linked to each other. In the coming slide, we will see these structures and their uses in detail, but for now let's see the relationship between them. So here we can see that IRQ data is being passed down to IRQ chip structure and IRQ data also contains a pointer to IRQ chip structure and IRQ domain structure while the IRQ data itself is embedded into IRQ disk. So we can say that IRQ chip and IRQ disk are also linked to each other using IRQ data. We will jump back to this slide again after we understand all these structures in detail. The first structure is IRQ domain. This structure is used for hardware interrupt number translation. What is interrupt number translation and why is it needed? We will see this in this slide. Let us start simple. The interrupt number for a particular device, given in device tree, is called Linux IRQ number and the interrupt controller's local interrupt number is called hardware IRQ number. When there is only a single interrupt controller in the system, the same IRQ number that is from the device tree can be used as controller local IRQ number that is hardware IRQ. But in the system with multiple controllers, the kernel must ensure that each one gets assigned non-overlapping allocations of IRQ number. For this reason, we need a mechanism to separate controller local interrupt number from the Linux IRQ number. The controller local number is allotted using IRQ domain operations, which we will be seeing in the next slide. These uses the IRQ domain structure. So IRQ domain structure includes IRQ domain ops as a member. Then here name is the name of interrupt domain. Host data is the data pointer to be used by a particular interrupt controller. This is not just by IRQ domain code. And then there are some flags we may assign. After the node is the pointer to the device tree node associated with the IRQ domain. This is used when we are decoding the device tree using interrupt specifiers. So we can summarize by saying IRQ domain helps in translating the device tree interrupt representation that is Linux IRQ number into the hardware IRQ number that can be mapped back to Linux IRQ without any extra platform code. These are all the possible APIs for domain operations. Here we see that IRQ domain structure is being passed in each of the operation. The match operation matches the interrupt controller node to the host, then map operation creates or create or update the mapping between virtual IRQ and hardware IRQ. And that delete such mapping. The mapping between Linux IRQ number and hardware IRQ number is created by calling a one function IRQ create mapping. This will also allocate a new IRQ disk structure and associate with the hardware IRQ number. Here we will also set up IRQ disk structure while returning from this map function. This IRQ disk is another important structure that we will be looking in detail in the coming slides. So when the interrupt is received, we can call find IRQ function to find the Linux IRQ number from the hardware IRQ. In the previous slide, we saw many IRQ domain operations are provided, but it depends on a particular interrupt controller among those which want to define. Here in case of GIC that we have took the example of, we have defined only a few IRQ operations. How to define these operations also depend on particular interrupt controller. Like in this example, GIC IRQ domain ALOC, we are doing the prime task of IRQ domain mapping, but in its own way using generic IRQ domain structure. Here are the key points to note. IRQ domain is tied to the node of interrupt controller in device tree. This structure shows the relation between global interrupt number to the local one. The IRQ ALOC disk and IRQ free desk API provide allocations and deallocations of IRQ number. When an interrupt is received, IRQ find mapping function should be used to find Linux IRQ number from the hardware IRQ number. IRQ disk structure. Each interrupt has its own interrupt descriptor structure called IRQ disk. In the previous slide, we saw how domain operations allocate the IRQ structure with all the details. This is a very big structure. Let us try to understand few of its important member. IRQ data is the member of IRQ disk. We will be seeing a lot about IRQ data in the coming slides. Here we also have interrupt flow handler as a member. Then actions define the interrupt action chain, that is the actions to be taken at the occurrence of interrupts. This structure holds various information or description about the interrupt file, like here is case state IRQ, which gives IRQ state per CPU, then we have total IRQ count, total IRQ count, et cetera. The debug FS file is the de-entry for debug FS file, and dev name is the flow handler name for proc interrupts output. These are quite helpful in debugging interrupts. The important thing is whenever the interrupt trigger, the low level architecture code calls this generic interrupt code by handle IRQ function. Let us have a quick recap of IRQ disk structure. It contains all the core information, includes interrupt handler function, provides one-to-one mapping to the Linux interrupt number, and IRQ struct data is embedded here. IRQ data, IRQ data is per chip data, interrupt per chip data. This structure is being passed down to IRQ chip coming next. Let us have a view of what the member does. Here the mask is pre-computed, bit mask for assessing the chip registers. IRQ is the interrupt number. Hardware IRQ is hardware interrupt number, that is local to the interrupt domain. Common points to the data shared by all IRQ chip. IRQ chip is a low-level interrupt hardware access, and this we will be seeing more in the coming slide. Domain is the interrupt domain translation. This is responsible for mapping between hardware IRQ and Linux IRQ number. This we have already covered in the previous slides. So let us have a recap of IRQ data. This is embedded into IRQ disk structure. This contains both hardware IRQ number and Linux IRQ number, contain pointer to the IRQ chip structure, provide link between IRQ chip and IRQ disk structure. IRQ chip. IRQ chip is a hardware interrupt chip descriptor, that includes all the functions defined by interrupt chip, or we can say controller. Here name is the name for prop interrupts. IRQ enables the interrupt, IRQ disables the interrupt and so on. This is basically a long list of all the interrupt chip functions that can be defined by a particular interrupt controller. IRQ chip structure, so we have covered what is IRQ chip structure. Let us have a quick recap. The structure is used to interrupt with hardware at a very low level. A set of matter describes how to drive the interrupt controller directly called by IRQ code. Like we showed in IRQ domain, according to the need of particular interrupt controller, we can define the functions which we need to implement for the GIC IRQ chip, from the generic IRQ chip structure. In this GIC IRQ chip, we have implemented a few of the functions like IRQ mask and mask IRQ get IRQ get IRQ chip state, IRQ set IRQ chip state. We will be seeing these two functions in the coming slide. So we can set or get the status of interrupt using the given functions, which are part of IRQ chip structure. IRQ get chip state returns the current state of the interrupt. While using IRQ set chip state function, we can set the current status of the interrupt. These are the current states, IRQ spending, IRQ active, IRQ mask, IRQ line level high. So let us go back to the slide of relation between all these structures before going to the generic IRQ handling. So as we saw, IRQ domain operation allocate the IRQ desk structure with all the details. IRQ data is being passed down to IRQ chip functions. IRQ data also contains a pointer to IRQ chip and IRQ domain structure. And IRQ data itself is nested inside IRQ decks. So there's a indirect link between these two using IRQ data. Now we will check how the generic interrupt framework handles the interrupt. So when the interrupt occurs, the low-level architecture code either called the handle IRQ function or it calls the generic handle IRQ desk function. Though this generic handle IRQ desk function also calls the same handle IRQ function. And this we will be seeing in the next slide. Based on different type of trigger levels for interrupt, different actions may need to be taken care in each case. The high-level IRQ flow handlers provide the predefined approach to deal with the hardware interrupts. These flow handlers are assigned to the interrupt descriptor during device initialization or at boot time. A few of these IRQ flow handlers are like these handle simple IRQ, which will handle the simple and software IRQ. We do not track the counts of interrupt, et cetera, for these IRQs. Then handle level IRQs to handle level triggered interrupt, handle per CPU, is per CPU local interrupt handler. Since this is local to a particular interrupt, it doesn't need any locking mechanism. So whatever the functions best suit the interrupt handling, we can call that. Let's say the selected flow handler is one of the high-level IRQ flow handlers. Then it may need to do few things before processing with the interrupt handling. For example, the interrupt handler, interrupt controller may need to acknowledge the CPU to make sure that interrupt was properly received. So also we may need to enable or disable a few of the interrupts from the chip. So here comes the advantage of having generic structure that we have defined earlier. The first thing we need to do is to enable the IRQ function. The flow handler don't need to know anything about architecture-specific details to accomplish these actions. All of it can be done relying on IRQ chip abstraction, which encapsulates the hardware-relevant functions. Afterward, the handle IRQ event function is called, which sets the IRQ state as in-progress and acquire the IRQ description lock, which sets the IRQ element per CPU function. We will be looking at this handle IRQ event function in the coming slide. But if we choose the handle per CPU IRQ as a flow handler, then the number of IRQ handled by the CPU is incremented and we can directly jump to handle IRQ event per CPU function. That is, instead of this function handle IRQ event and then calling the handle IRQ event per CPU, event per CPU function is directly called. Inside handle IRQ event per CPU, we call this function, that is, underscore underscore handle IRQ event per CPU and the weights call it return to add some randomness. This handle IRQ event per CPU function is where the genetic IRQ subsystem takes the actions defined in the selected interrupt controller. So we can say this is the main function where interrupt controller specific handling occurs. After all the actions are handled, IRQ handle is returned and then this underscore handle handle IRQ event per CPU returns the flag and these flag can be used to add randomness to the IRQ handling as mentioned earlier. So as we discussed earlier, we can see the generic handle IRQ function is calling the same handle IRQ function only. Here the handle IRQ event, when this function is called, this sets the IRQ state as in progress and then acquire the lock and then call the handle IRQ event per CPU function. Once the handle IRQ event per CPU returns, it cleared the in progress status and it returns. Here we have another type of flow handler that is being used for bad interrupts or we can say spurious interrupts that we discussed in the interrupt type slides. So for bad interrupts, it print many of the IRQ descriptors, then increase the interrupt count and acknowledges the bad IRQ. This ACK bad IRQ function implementation is architecture specific and that is it depends on a different architecture how they want to acknowledge these bad interrupts. Let us quickly have a recap of what we understood about generic interrupt flow working. So when the interrupt occurs, the handle IRQ function is stored in IRQ disk structure which is directly called or we can also call the generic handle IRQ disk function. Handle IRQ event sets the IRQ state as in progress, acquire the IRQ description lock and then call handle IRQ event per CPU. Inside handle IRQ event per CPU function, we call underscore underscore handle IRQ event per CPU and add some randomness to the pool of interrupts handled by the CPU. Underscore underscore handle IRQ event per CPU takes all the actions based on IRQ disk required for the interrupt and then clear the interrupt. To handle this furious or bad interrupts, we call handle IRQ bad function. Now we will understand a few generic interrupt APIs that can be called from device driver directly without the need to know how they're implemented. So first is request IRQ that this adds a handler for a interrupt line. Here IRQ the first parameter is the interrupt number. The second parameter handler is a function pointer to the actual interrupt handler that will service this interrupt. Here the return type is IRQ handler T. So if the return value is IRQ handle, it indicates the processing is completed successfully, but if the value is IRQ none, it indicates the processing failed. The third parameter IRQ flags might be either zero or bit mask of one or more flags. We will see the different interrupt flags in the coming slide. The fourth parameter is the device name. This is these texts are associated with a slash proc IRQ or proc interrupts for communication with user space. And finally, the fifth parameter is device ID, which is particularly used for shared interrupts. We can pass null here if the line is not shared, but we must pass a unique device ID, which is used to differentiate between multiple handlers for shared interrupt. Free IRQ frees the interrupt allocated using request IRQ. So here we remove the interrupt handler and if the interrupt line is no longer in use by any of the driver, then that line is also disabled. Then enable IRQ enables the handling of interrupt, disable IRQ is used to disable the selected interrupt line. This function waits for any pending IRQ handler for this interrupt to complete before returning. Disable IRQ no sync. So unlike disable IRQ, this function does not ensure existing instances of IRQ handler has been completed before returning. There are many other type of hardware, high-level driver APIs like IRQ set type, which sets the interrupt type for interrupt, then request MNMI. This allocates a interrupt line, especially for non-maskable interrupt delivery and many more. Another set of APIs is in IRQ and interrupt. These are used to check if we're currently in the interrupt handler. Sorry. Then for disabling all the interrupts, we can use local IRQ save and local IRQ disable APIs. A call to local IRQ save disables interrupt delivery on the current processor, but it will save the current interrupt stage into the flags while the local IRQ disable disables interrupt delivery without saving the interrupt state. Interrupt flags are only used by the kernel as a part of interrupt handling routine. All these available flags in the interrupt subsystem are defined in Linux interrupt.h5. Let us discuss a few of the important flags. IRQF shared. This flag allows sharing of IRQ among different devices. Simply we can say that we use this flag for shared interrupts. Then we have IRQF timer, which marks the interrupt as timer interrupt. Then IRQF per CPU. So this signifies the interrupt is per CPU and then IRQF no suspend signifies that we should not disable this IRQ during suspend operation. And there are so many flags here like you can see. Special when IRQF no balancing is used to exclude this interrupt from interrupt balancing. By interrupt balancing we means we will distribute the hardware interrupt across the processor to improve the system performance. And then we have many other flags like IRQF no auto N that signifies that don't enable the IRQ automatically with the user request. User will explicitly enable it if it's not using enable IRQ or enable NMI functions later. NMI is a non-maskable interrupt. Let us take a simple example to understand how device driver use these generic APIs. Here the WDT device driver, we can see WDT in it function called the generic API of request IRQ to register the interrupt. So here we can see there are the same parameters. This way generic interrupt subsystem helps in the implementation of interrupt handler without the need of knowing the internal details. Here IRQ is the interrupt number then WDT interrupt is the interrupt handler. Also the no flags are set because the third parameter is zero. Then this WDT501P is a device name of the device associated with the interrupt. And Nullis passes with the argument that indicates the interrupt is not shared. Then when the WDT exit, we will free the IRQ handler by calling this free IRQ and passing the same interrupt number that we used in request IRQ. Now we have few slides to show how interrupt look like from the user space. PROCFS interrupt interface shows all the active IRQs PROC IRQ is the directory which is used to set CPU affinity for a particular interrupt. As we discussed earlier, affinity allows the system to connect to a particular IRQ and this is like affinity allows to, sorry, excuse me, affinity allows the system to connect to a particular IRQ to only one CPU or it can exclude a CPU from handling any other IRQs as well. SMP affinity is a bit mask in which we can specify which CPUs can handle the IRQs just by writing it by a user space. For example, here the default SMP affinity which was set to FF, here we change it to one by directly writing the value to H. It means that only the first CPU now will handle this IRQ. The default SMP affinity mask applies to all the non-active IRQs which are the IRQ that has not been allocated or activated and hence they'll also like the PROC IRQ directory. Here example of those inactive IRQs are, we can see from 20 to 36, then 51, 53, et cetera. In this slide, we will see how interrupts look like from user space. We can see all the interrupts detailed by checking the PROC interrupt file. Here first column is IRQ number for each interrupt. IRQ number determines the priority of the interrupt that needs to be handled by the CPU. Smaller IRQ number means higher priority. Then we have columns stating how many times the CPU core has been interrupted in a multi-core system. Here we can see by default, most of the interrupts are handled by code zero only, like you can see here. So by default, all the interrupts will run on code zero, but we can change the affinity of these interrupts by writing to the SMP affinity. Then we have the IRQ chip name. These are the hardware IRQ number, that is local interrupt controller number, then the type of interrupt and the interrupt handler. Here we can see a few IPIs. These are the interrupt processor interrupts at the very last. An interrupt processor interrupt, IPI is a special type of interrupt by which one processor may interrupt the another processor in the multi-processor system. If the interrupt in processor requires some actions from the another processor, the interrupt processor interrupt allow a CPU to send interrupt signals to another processor in the system. So the IPI is not delivered through a line, as you can see, but directly as a message. We can expose interrupt information through surface interface also, as you can see. The information of interrupts is also exposed via pro interrupts and why the surface interrupt was deleted, because the format of the file, this file has been changed over kernel versions and different across the different architecture. Also, it has a varying calling number name depending on the hardware. Like if say it is a single code, then there will be just CPU zero. So the format varies from hardware to hardware and this make it very hard to parse. To solve this, we expose the information through SysFS, so each IRQ attribute is in a separate file, so it can be parsed in a consistent machine parsable way. Here we can see all the descriptors are present like IRQ actions, chip name, hardware, IRQ name, wake up, et cetera. But this feature is only available when config spars IRQ and config SysFS is enabled. These many configurations also need to be enabled for debugging interrupts. After we enable all the debugging configurations for interrupt as discussed in previous slide, we can debug a lot of things. We can see a handler for a particular IRQ number, we can see the status, what is the current status, what are all the flags set, what is the common IRQ data state, what is the CPU affinity for the interrupt, what is the domain name, what is the hardware IRQ number, that is the local interrupt number and much more. So there are many things to debug if we enable the debug FS interface and some configurations are shown. With this, I conclude my presentation on understanding Linux interrupt subsystem. And I hope this would be useful for deeper insights into Linux interrupt subsystem. I thank you all for being with me till the very last. So any questions? Thank you.