 Hello everyone, my name is Ellen Koike and welcome to this presentation about ImageSignal Processor, ISP, Drivers and how to merge one upstream. A little bit about me, so I work with Collabora since 2016 and which we are an open source consultancy company and I'm mostly working on the kernel mainly in the media subsystem. I also have experience in other subsystems such as any VME, also device mapping and here and there in general but mostly inside the media subsystem. Right now I am the maintainer of the RK ISP driver that we're going to talk a little bit during this presentation and I'm also the maintainer of the virtual media controller driver. I am really proud to say that I was an outreach intern in 2015 where I started the VINC project there, Lohan Pansharra was my mentor and who proposed this project and right now I am the co-coordinator of the Linux kernel project in Outreach together with Vaishali who is the current coordinator so if you have any questions regarding this this program or if you want to become a mentor so feel free to reach out. The main goal of this presentation is to give you an overview of the camera ISP memory pipeline then we are going to give another overview about the media framework, some design choices when implementing our driver, also some lessons learned when I was upstreaming the RK ISP driver and I also want to talk about this specific user space tool called Liby Camera. So I hope that this presentation is going to be useful for those who are not entirely familiar with the Vita from Linux community but want to get started and upstream their own ISP driver and the subsystem. Okay so let's start with the camera ISP memory pipeline. On our phone usually we have the camera and this camera is composed by a sensor. This sensor is composed by smaller light sensors that's represented by this gray grid on this image which comes from Wikipedia. These gray sensors they are sensible to light and on top of it there's a filter a color filter that only allows a specific color to pass through which means that the set event is basically a color sensor and the readings of all those color sensors is sent to your computer to your SOC processed in some way and the image is made available to your application. So what is an ISP? An ISP stands for image signal processor and the common use case is that the ISP receives the readings of all those small color sensors and transforms it in an image that is usable for user space. It can also perform several other image transformations such as format conversion, the wiring for instance, so the format with all those small color readings is that we call a buyer format and transforming it in actual pixels is the process that we call the wiring. We can also convert to RGB, YUVE and several others. ISP can also perform crop, resize, white balance, compose, image stabilization that we're gonna talk a little bit about it, effects, filters, flip, rotate, several others. So those are processing that's performed by the hardware and it helps of loading your application so that you don't need to perform software. An ISP can also generate some statistics such as histogram, error contrast and several others that they are used by application to implement algorithms that can read those statistics and reparameterize the ISP to improve the image on the fly. So some examples of those algorithms are histograms, histograms equalization, what we call 3A for autofocus, auto exposure and auto white balance. When an ISP is not, so an ISP is not a codec, ISPs work with raw and uncompressed image while codecs that we can divide into encoders, they transform raw image to a compressed image format such as A264, JPEG or VP9 and there are several others and decoders do the opposite, they transform compressed image into raw image, back to raw image. Now let's talk a little bit about different kinds of ISPs, so inline versus offline. Offline ISPs can be divided in two phases, one is retrieved information from the sensor and place it somewhere in memory, the other one is getting this memory that's usually in a buyer format and do some kind of processing and place it back in memory. Usually those two phases are implemented into separated drivers and communication between those two drivers is coordinated by user space. An example is the IPU-3, the Intel IPU-3 that is divided in IPU-3 CIO-2, the camera interface drivers that gets the image from the sensor and place it in memory and the other one is the IPU-3 image units driver that process this image and send it to user space. An inline ISP is when the data reaches memory only in the end, so the sensor is directly connected to the ISP without touching the memory, then ISP does some kind of processing and then it reaches memory for user space. One example is the RK ISP-1 driver, this is not entirely true because this driver can do this hardware can do both actually, but in the driver only inline is implemented, so we can have hybrid devices that can get the image directly from the sensor or from the memory and it can behave as inline or perform the second phase of an offline ISP. One example is the MediaTek 8183-PES-1 driver which is not yet upstream but it is published in the mailing list, so the image can come from those two paths, so one is represented by the DMA input there is the memory and the other one is a connection directly from the sensor, so sensor camera interface. This data goes to the Byring processing and then it reaches memory for user space. Mipidify, let's take a look at this, so Mipidify is a very common bus used in the market for camera and displays, it is specified by the MIPI Alliance and it is a physical layer with high data rate that can process 4K image, so ultra-high definition image, ultra-high resolution with a really good frame rate. In this image there we have Raspberry Pi that is connected to a sensor through this flat cable bus, so this bus is the Mipidify. This bus can have up to four data lanes and it also has an I2C bus for configuration, so in this image on top we have the camera connected with the host and we have several data lanes, some clocks and then the I2C bus. On top of this bus we can have up to two protocols, so one is the DSI2 for display serial interface when the computer wants to output image to somewhere, usually a display, and the other one is the CSI2, the camera interface to capture image from somewhere to the computer, usually from a camera sensor. And this Mipidify CSI2 is a really frequent term in the ISP land, so I mentioned this because you're gonna see these acronyms in some other diagrams during this presentation. Now I would like to talk about the RK ISP1 driver in specific because I have most experience with this and I will also use this driver as an example during this presentation. So the RK ISP1 is the driver of the ISP block that's present in the Rockchip RK3399 SOCs. These SOCs can be found in several devices such as the Scarlett Chromebooks which is the tablet in the image, Rock Pieboards or Pinebook Pro laptops. The driver was originally written by Rockchip, it was merged in kernel 5.6 under drivers staging with more than 9,000 lines of code. So this is the hardware architecture of the ISP, the Rockchip. The image can come from those two buses, one is the Mipidify that we mentioned and the other one is Parallel Bus, but the parallel is not implemented in the driver right now. Then image goes to the ISP which performs some kind of processing, then it goes to some image enhancement blocks and the image can go to these one of those two paths or both of them at the same time. Those two paths can perform cropping, resizing, do some RGB flip rotation also and then it reaches the memory. The main difference between those two paths is the use case. When you have your phone and you want to take a picture, you can see there's a preview and this preview is really fast. If you rotate it can rotate quickly and it is display ready. You can see that there's almost no delay, it's live preview so it's really fast and if you noticed you can see that the resolution that it shows is usually not the full resolution that your camera supports. So when you take the picture it will take a little bit more time and the final image is a really higher resolution. So basically this preview would come in this case from the self path picture and the picture itself would come from the main picture path. So the main picture path supports a higher resolution, it doesn't need to generate RGB display ready images, doesn't need to flip rotate while the self path needs to be fast for display and doesn't need to be a higher resolution. So one is fast, another one is lower. The ISP comprises with image signal processing, many image enhancement blocks, crop, resizer, RGB display ready image, image rotation and we have those two paths. So the self path for preview and the main path for picture as we mentioned of course those are use cases, user space is free to use whatever it wants. Now let's take a look at the kernel side, the media framework. In the Linux kernel in the media framework there is this concept that we call topology. So user space can query a node inside the VFS let's say slash dev slash media some number x to retrieve how the inner blocks of the hardware are interconnected and the order of the image processing. For instance in this image user space can know how those blocks and how they are connected it can know that we have a sensor that is directly connected to the ISP that is connected to some DMA engine where the image reaches memory. In topology we have two types of nodes so what we call sub devices in green that represents the inner parts of the hardware and user space can perform some configurations there and the video devices in yellow that represents the DMA engine where user space can perform some configurations but it can queue and the queue buffers contain the image or metadata to and from the hardware. In this example it is from the hardware because we are retrieving an image from the sensor but we are going to see some other examples that the user space needs to inject an image into the driver or some kind of metadata. Those blocks are connected by what we call links and links always connect pads so those zeros and ones in the image are what we call pads. Some interesting thing to note is that usually the sensor block is a driver that is separated from the rest this is because we can reuse the same sensor with different kind of hardware so in my development setup I have the same sensor that can be used with the Raspberry Pi that we saw in the previous image but I use with the RockPi board so the sensor is the same but the ISP driver is different. This is the topology of the IPU3 CIO2 driver the camera interface that performs the first phase of the offline ISP. We can see here four blocks of the IPU3-CSI2 sub-device it means that we can we have four buses and we can connect we can retrieve image from sensor for four sensors at the same time in this case we only have a single sensor so the IMX355 that is that is connected to the first block and the other ones are not connected and the yellow blocks are where the user space is going to interact so in this case slash dev slash video zero to retrieve images from the kernel. The camera interface will place the image in memory in some format that is very specific to this Intel driver so it is up to user space to get this image and feed to this other driver so the IPU image unit that performs the second phase of the offline ISP. Then user space is going to retrieve the image from the previous driver and feed it to this first node there the IPU3-IMGU0 input to inject the image inside the buffer inside this driver then it will perform some kind of processing and it will make the process the results available through one of those two paths so we have the output there and the viewfinder. If I recall correct those are the equivalent of the main path and the self path so one is meant for the picture itself with higher resolution and the other one is for preview to be fast. This driver can also generate some statistics through the node down there so 3a stat and user space can read those statistics and put some parameters on that node called parameters so in this case we have two instance of the ISP it means that it can process two images at the same time. Here is the topology of the RK-ISP1 driver which is in line in a specific this is the topology of the Scarlett Chromebook which is a tablet and we have a back camera and a front camera. Those camera sensors are represented by the top blocks starting with OV and just one of them can be connected to the ISP at a time that's why you see one line is dashed. ISP performs some kind of processing and it outputs the image through one of those two paths so main path and self path as we mentioned before and this driver can also do something similar to the Intel driver generating statistics and receiving parameters from user space. Now I want to talk about some driver configuration architecture that you should think a little bit about how you design your driver in a specific case auto versus manual configuration propagation. In the auto configuration scheme user space will do all the configurations and operations on the DMA engine node so in the yellow blocks there. When user space want to set a resolution it will set the resolution through the node slash dash slash video zero and it's up to the driver to propagate this configuration on all the blocks of the topology including the sensor that can be a separate driver. So as you can see the other blocks doesn't have any nodes exposed on the VFS. In the other hand in the manual configuration propagation scheme user space is the one responsible for configuring all the pads through all the the pipeline or the image pipeline. So for instance if user space wants to configure a resolution it needs to configure a resolution that the sensor up there needs to generate on its pad zero. Then it needs to configure the same resolution to the ISP block in the pad zero to inform the ISP which resolution should expect from the sensor. Then user space needs to configure the resolution that the ISP needs to generate and it needs to match the resolution that the resizer expects and needs to configure the resolution that the resizer should generate so can could be bigger or smaller to zoom or shrink the image and finally setting the resolution that is expected inside the memory buffer for the final image. As you can see this increases along the complexity for user space since it needs to perform all those configurations and if formats don't match when you try to start the stream it will fail. On the other hand it provides you a finer grain configuration in the inner blocks of the hardware and I'm going to show you an example when we need this but also the more blocks that we have exposed to user space more complex it becomes since we have some we have more points that we need to perform the configuration. Also manual configuration is extendable since if we come back to the previous slide we cannot mix also and manual configuration so if you choose one scheme you need to stick with it we cannot have a manual configuration on the sub-device on the sensor block there and auto configuration on the ISP doesn't make much sense so if we want to add more blocks in the future and you chose auto configuration then you realize that this new block could benefit from the manual configuration then you won't be able to change it it's it's or maybe you could add some parameters when loading the driver but then it won't be compatible with other applications from user space anymore. Why our KSP is manual? To answer this I want to talk a little bit about crop so how do you crop the image you just select a sub rectangle in the main image of the sub image that you are interested in the media the media subsystem allows you to expose an API that allows user space to select a sub rectangle on all the pads actually but we are going to see that we could allow user space to select this sub rectangle under the node the final node there slash dev slash video zero but we're going to see that we have some problems because this driver allows cropping the image from the sensor so before doing the processing in the ISP and it also allows cropping the image just before the resizer so it can shrink or zoom a specific image and exposing this crop only once in the final video node would be confusing for user space since it wouldn't know which crop being the driver is using so instead of exposing the crop only on the final node and allow auto configuration we expose we expose the API of selecting a sub rectangle on the topology in specific pads to make it clear to user space where this cropping is taking place so in this example we allow user space to select the sub rectangle on pad zero of the ISP so it can select which part of the image from the sensor it should work with and also in pad zero from the resizer so you can select which parts should zoom or shrink now about the image stabilizer how it works so usually we have the main image and the image stabilizer just selects a sub rectangle in the main image the idea is that only the outer rectangle would shake and not the inner one so if you shake your phone the image shouldn't appear that shakes that much just to relate a bit when you open your phone your camera and you select the video mode you can notice that the angle of the image shrinks a bit it seems that it zoomed a little bit it means that it's just showing you the sub rectangle is not showing the outer part so it can have some space to work with that means we need to allow user space to select another sub rectangle and we expose this one in the pad two of the ISP so we have three points to select sub rectangle then exposing this just in a single point and allowing auto configuration doesn't make much sense that's why in this case we have manual configuration propagation files to the system I want to talk a little bit about some design choices these ones the original topology of the driver when I started working with it we had this block exposure so rock chip sy me pd phi and we removed it and I'm going to explain you why so this block represents the me pd phi bus right in the manual configuration propagation as we saw if we have more sub devices more complex for user space so we need to rethink a little bit more about which blocks we want to expose and the five block there doesn't expose any image configuration this just represents a connection point ideally in the topology the use for information is just the image processing steps and also the same processing steps could be used in different buses in this case the rksp one it supports the parallel bus it's not recommended yet but could support and the me pd phi si2 so the idea is that that one doesn't provide much information so if we come back to the original topology when if we want to add support for the parallel bus we would need to update topology dynamically depending which bus you are using replacing that block from with the parallel block which would be confusing for user space that needs to perform all the manual configuration or we would need to expose another block to represent the parallel bus and another more block more complex for user space that's why we decided to remove it some lessons learned if you have so I see this and some other drivers that people post upstream the code from the bus is usually integrated with the code of the isp so if you can separate those in two different drivers one inside the media subsystem and the other one to the five abstraction layer subsystem under drivers slash five then with these you will have a more generic topology for any bus it is less complex for user space and the isp driver itself becomes much cleaner since you separate those two drivers and the five driver can be used for other protocols as well for the dsi for instance if you can have both protocols under the same line so some more lessons learned not only in the perspective in the technical perspective but I also want to talk a little bit about the community perspective the video for Linux community is very open to accept drivers in staging with the condition that you work on it to move it out as soon as possible you also need to have the tail to the list with requirements what is missing from moving it from the the from staging the advantage is that it make make available to other people to use and also improves your workflow it's much easier to get contributions from others if it is already upstream somewhere otherwise people will need to send you patches directly to you and you will need to integrate on our patch set every time and reposted every time so it's much easier if it's already upstream also it makes easier for people to test to send you bug reports and it also decreases the maintenance cost since you don't need to keep rebasing all the time and you can work on it step by step with also the the collaboration from from the community who can follow all the all the progressions so just still about the staging I really recommend you to do that with the condition that you work to remove it as soon as possible so some more lessons learned don't be afraid to reorganize the code if you are going to maintain the driver and make sure that you feel comfortable with it it has your your way right so don't be afraid to change the files namings the code or the rewrite functions also I would recommend splitting the code between different files pair implementation nodes or pair per block in the topology some cases this is not entirely possible because we reuse a lot of code from one block to another but at least separate the code between the video nodes and the sub device nodes since those are different they have different hooks inside the media framework and it makes much easier to reveal them if they're separate of course those are all tips take it with a grain of salt check if those applies through your case I would also separate the codes that configures the hardware from the codes that implements the video for Linux API mostly because when I am reviewing other people's code I don't really know the hardware so I can I wouldn't know if the register that you are writing to is the correct one so if they're well separated doesn't need to be in different files it could be just in different sections inside your your a single file I can focus on the video for Linux implementation for review also I recommend removing all the code that you are not using using or that you can't test for example the rk isp one driver also supports the rk 3288 soc but I wasn't using it I wasn't testing it so just remove it but keep the code in a way that is extendable it's easy to add to put it to put the support back also the driver defy part it had support for all the mipi device csi ports and it also had support for the dsi port and I was not using this one and most of the hardware that I saw most of the boards were also using just a single port so just remove everything the code was already huge so the idea is to simplify the code but keep extendable we also had lots of macros in the headers a lot of headers small headers that was not that easy to navigate some several macros that were not being used so just remove everything make it smaller and easier to reveal now let's talk about this specific project the libcamer one from user space as we can see not all features are auto discoverable from user space an example is the rk isp one driver that we can select the sub rectangle for cropping or sub rectangle for the image stabilizer and in the media api the api there's no way for user space to know that this sub rectangle represents a cropping or the image stabilizer which means that the driver in user space needs to know a little bit about the driver in the kernel also the metadata data buffer structures that we have in the statistics and also in parameters are usually in some kind of format that is very specific to the to the specific driver this means that it requires user space to have specific implementations for specific drivers for so this means that we would have a specific application for a specific hardware and the problem is that code is not very reusable it is very hard to test since you need a specific implementation you cannot get a generic one a generic application and usually those applications are proprietary that's where libcamer comes in so libcamer is an open source camera stack for many platforms with a core user space library it has user space drivers that has the knowledge of those specific drivers on the kernel and it also has a ways for image processing algorithms that knows how to deal with the parameter and statistics and what is nice is that libcamer allows you to plug your own image processing algorithms some as a plugin so you can plug your proprietary algorithm and it separates really nicely the open source part from the proprietary part we can think libcamer as the equivalent of the mesa project for the graphics world but for the camera world here is the architecture of libcamer that you can see under the libcamer docs i'm not going to pass through all those blocks i just want to mention some specific parts so down there we can have we we can see the mc and video front end support block mc for media controller so this is the part responsible for talking with the kernel we can have buffer allocators and on top we have the camera device with a driver specific code so we have the pipeline handlers that is responsible for configuring the whole topology and it knows which subwork tango mean cropping and which ones mean image stabilizer we can also have image processing algorithms that can that allows plugins and can be proprietary so a tip that i can give you is if you are working upstreaming an isp i recommend you to add push or update support for your hardware under the libcamer project it makes easier to test because sometimes configuring the whole topology is very painful you get more users more developers involved since libcamer is a very recent and very active project people are involved in both sides so in libcamer and also in the kernel community so you're probably going to receive more feedback from for your driver reveals guidelines the design design guidelines to improve the quality of your driver and you also contribute with this awesome project i foresee libcamer being used everywhere in the future so not only in our desktops but on our phones chromebooks and several devices and that was it that i wanted to present to you thank you very much for watching and feel free to reach out if you have any questions any comments my email address is ellen.quake.com thank you very much