 Hello everyone, my name is Ellen Koeke and welcome to this presentation about image signal processor, ISP, drivers, and how to merge one upstream. A little bit about me, so I've worked with Colabora since 2016, which we are an open source consultancy company. And I'm mostly working on the kernel, mainly in the media sub-system. I also have experience with other sub-systems such as NVMe, Ulsa, device mapping, and here and there in general, but mostly inside the media sub-system. Right now I am the maintainer of the RK ISP driver that we are going to talk a little bit during this presentation. And I'm also the maintainer of the virtual media controller driver. I am really proud to say that I was an outreach intern in 2015, where I started the VINC project there. Lohan Pansharra was my mentor and who proposed this project. And right now I am the co-coordinator of the Linux kernel project in Autorici, together with Vaishali, who is the current coordinator. So if you have any questions regarding this program, or if you want to become a mentor, so feel free to reach out. The main goal of this presentation is to give you an overview of the camera ISP memory pipeline. Then we are going to give another overview about the media framework, some design choices when implementing our driver, also some lessons learned when I was upstreaming the RK ISP driver. And I also want to talk about this specific user space tool called Liby Camera. So I hope that this presentation is going to be useful for those who are not entirely familiar with the video for Linux community, but want to get started and upstream their own ISP driver in the sub-system. Okay, so let's start with the camera ISP memory pipeline. On our phone, usually we have the camera and this camera is composed by a sensor. This sensor is composed by smaller light sensors that's represented by this gray grid on this image, which comes from Wikipedia. These gray sensors, they are sensible to light and on top of it there is a filter, a color filter that only allows a specific color to pass through. Which means that the center of it is basically a color sensor. And the readings of all those color sensors is sent to your computer, to your SoC, processed in some way and the image is made available to your application. So what is an ISP? An ISP stands for Image Signal Processor and the common use case is that the ISP receives the readings of all those small color sensors and transforms it in an image that is usable for user space. It can also perform several other image transformations such as format conversion, the wiring for instance. So the format with all those small color readings is that we call a buyer format and transforming it in actual pixels is the process that we call the wiring. We can also convert to RGB, YUVE and several others. ISP can also perform crop, resize, white balance, compose, image stabilization that we are going to talk a little bit about it. Effects, filters, flip, rotate, several others. So those are processing that's performed by the hardware and it helps offloading your application so that you don't need to perform software. An ISP can also generate some statistics such as histogram, error contrast and several others that they are used by application to implement algorithms that can read those statistics and re-parameterize the ISP to improve the image on the fly. So some examples of those algorithms are histograms, histograms equalization, what we call 3A for autofocus, auto exposure and auto white balance. What an ISP is not? So an ISP is not a codec. ISPs work with raw and uncompressed image while codecs that we can divide into encoders they transform raw image to a compressed image format such as A264, JPEG or VP9 and there are several others and the coders do the opposite. They transform compressed image into raw image, back to raw image. Now let's talk a little bit about different kinds of ISPs. So inline versus offline. Offline ISPs can be divided into phases. One is retrieved information from the sensor and place it somewhere in memory. The other one is getting this memory that's usually in a buyer format and do some kind of processing and place it back in memory. Usually those two phases are implemented into separated drivers and communication between those two drivers is coordinated by user space. An example is the IPU3, the Intel IPU3 that is divided in IPU3CIO2 the camera interface drivers that gets the image from the sensor and place it in memory and the other one is the IPU3 image units driver that process this image and send it to user space. An inline ISP is when the data reaches memory only in the end. So the sensor is directly connected to the ISP without touching the memory. Then ISP do some kind of processing and then it reaches memory for user space. One example is the RK ISP1 driver. This is not entirely true because this hardware can do both actually but in the driver only inline is implemented. So we can have hybrid devices that can get the image directly from the sensor or from the memory and it can behave as inline or perform the second phase of an offline ISP. One example is the MediaTek 8183PES1 driver which is not yet upstream but it is published in the mailing list. So the image can come from those two paths. So one is represented by the DMA input there is the memory and the other one is a connection directly from the sensor. So sensor camera interface. And this data goes to the Byring processing and then it reaches memory for user space. Mipidify. Let's take a look at this. So Mipidify is a very common bus used in the market for camera and displays. It is specified by the Mipi Alliance and it is a physical layer with high data rate that can process 4K image, so ultra high definition image, ultra high resolution with a really good frame rate. In this image there we have a Raspberry Pi that is connected to a sensor through this flat cable bus. So this bus is the Mipidify. This bus can have up to four data lanes and it also has an I2C bus for configuration. So in this image on top we have the camera connected with the host and we have several data lanes, some clocks and then the I2C bus. On top of this bus we can have up to two protocols. So one is the DSI2 for display serial interface when the computer wants to output image to somewhere, usually a display. And the other one is the CSI2, the camera interface to capture image from somewhere to the computer, usually from a camera sensor. And this Mipidify CSI2 is a really frequent term in the ISP land so I mentioned this because you're going to see these acronyms in some other diagrams during this presentation. Now I would like to talk about the RK ISP1 driver in specific because I have most experience with this and I will also use this driver as an example during this presentation. So the RK ISP1 is the driver of the ISP block that's present in the Rockchip RK3399 SOCs. These SOCs can be found in several devices such as the Scarlett Chromebooks which is the tablet in the image, RockPyboards or Pinebook Pro laptops. The driver was originally written by Rockchip, it was merged in kernel 5.6 under driver's staging with more than 9,000 lines of code. So this is the hardware architecture of the ISP, the Rockchip. The image can come from those two buses. One is the Mipidify that we mentioned and the other one is Parallel Bus but the parallel is not implemented in the driver right now. Then image goes to the ISP which performs some kind of processing then it goes to some image enhancement blocks and the image can go to one of those two paths or both of them at the same time. Those two paths can perform cropping, resizing, do some RGB, flip rotation also and then it reaches the memory. The main difference between those two paths is the use case. When you have your phone and you want to take a picture you can see there's a preview and this preview is really fast. If you rotate it can rotate quickly and it is display-ready. You can see that there's almost no delay. It's live preview so it's really fast and if you notice you can see that the resolution that it shows is usually not the full resolution that your camera supports. When you take the picture it will take a little bit more time and the final image is a really higher resolution. Basically this preview would come in this case from the self-path picture and the picture itself would come from the main picture path. The main picture path supports a higher resolution it doesn't need to generate RGB display-ready images it doesn't need to flip, rotate while the self-path needs to be fast for display and doesn't need to be a higher resolution. One is fast and the other one is lower. The ISP comprises with image signal processing many image enhancement blocks crop, resizer, RGB display-ready image, image rotation and we have those two paths so the self-path for preview and the main path for picture as we mentioned of course those are used cases user space is free to use whatever it wants. Now let's take a look at the kernel side, the media framework. In the Linux kernel, the media framework there is this concept that we call topology. So user space can query a node inside the VFS let's say slash dev, slash media, some number x to retrieve how the inner blocks of the hardware are interconnected and the order of the image processing. For instance in this image user space can know how those blocks and how they are connected it can know that we have a sensor that is directly connected to the ISP that is connected to some DMA engine where the image reaches memory. In topology we have two types of nodes so what we call sub-devices in green that represents the inner parts of the hardware and user space can perform some configurations there and the video devices in yellow that represents the DMA engine where user space can perform some configurations but it can queue and the queue buffers contain the image or metadata to and from the hardware. In this example it is from the hardware because we are retrieving an image from the sensor but we are going to see some other examples that the user space needs to inject an image into the driver or some kind of metadata. Those blocks are connected by what we call links and links always connect pads so those zeros and ones in the image are what we call pads. Some interesting thing to note is that usually the sensor block is a driver that is separated from the rest. This is because we can reuse the same sensor with different kind of hardware. In my development setup I have the same sensor that can be used with the Raspberry Pi that we saw in the previous image but I use with the RockPi board so the sensor is the same but the ISP driver is different. This is the topology of the IPU-3 CIO-2 driver, the camera interface that performs the first phase of the offline ISP. We can see here four blocks of the IPU-3-CSI-2 sub-device. It means that we have four buses and we can retrieve images from sensors at the same time. In this case we only have a single sensor, the IMX-355 that is connected to the first block and the other ones are not connected. The yellow blocks are where the user space is going to interact so in this case, last devs, last video zero to retrieve images from the kernel. The camera interface will place the image in memory in some format that is very specific to this Intel driver so it is up to user space to get this image and feed to this other driver so the IPU image unit that performs the second phase of the offline ISP then user space is going to retrieve the image from the previous driver and feed it to this first node there, the IPU-3-IMGU0 input to inject the image inside the buffer inside this driver then it will perform some kind of processing and it will make the process the results available through one of those two paths so we have the output there and the viewfinder if I recall correct, those are the equivalent of the main path and the self path so one is meant for the picture itself with higher resolution and the other one is for preview to be fast This driver can also generate some statistics through the node down there, so 3A-STAT and user space can read those statistics and put some parameters on that node called parameters so in this case we have two instance of the ISP it means that it can process two images at the same time here is the topology of the RK-ISP-1 driver which is in line this is the topology of the Scarlet Chromebook which is a tablet and we have a back camera and a front camera those camera sensors are represented by the top blocks, starting with OV and just one of them can be connected to the ISP at a time that's why you see one line is dashed ISP performs some kind of processing and it outputs the image to one of those two paths so main path and self path as we mentioned before and this driver can also do something similar to the Intel driver generating statistics and receiving parameters from user space now I want to talk about some driver configuration architecture that you should think a little bit about and how you design your driver in a specific case auto versus manual configuration propagation in the auto configuration scheme user space will do all the configurations and operations on the DMA engine node so in the yellow blocks there when user space wants to set a resolution through the node slash dash slash video 0 and it's up to the driver to propagate this configuration on all the blocks of the topology including the sensor that can be a separate driver so as you can see the other blocks doesn't have any nodes exposed on the VFS in the other hand in the manual configuration propagation scheme user space is the one responsible for configuring all the pads through all the pipeline the image pipeline so for instance if user space wants to configure a resolution it needs to configure a resolution that the sensor up there needs to generate on its pad 0 then it needs to configure the same resolution to the ISP block in the pad 0 to inform the ISP which resolution should expect from the sensor then user space needs to configure the resolution that the ISP needs to generate and it needs to match the resolution that the resizer expects and needs to configure the resolution that the resizer should generate so it could be bigger or smaller to zoom or shrink the image setting the resolution that is expected inside the memory buffer for the final image as you can see this increases along the complexity for user space since it needs to perform all those configurations and if formats don't match when you try to start the stream it will fail on the other hand it provides you a finer grain configuration in the inner blocks of the hardware I'm going to show you an example when we need this but also the more blocks that we have exposed to user space the more complex it becomes since we have more points that we need to perform the configuration also manual configuration is extendable since if we come back to the previous slide we can mix auto and manual configuration so if you choose one scheme you need to stick with it we cannot have manual configuration on the sub-device or on the sensor block and auto configuration on the ISP doesn't make much sense so if we want to add more blocks in the future and you chose auto configuration then you realize that this new block could benefit from the manual configuration then you won't be able to change it or maybe you could add some parameters when loading the driver but then it won't be compatible with other applications from user space anymore our KSP is manual to answer this I want to talk a little bit about crop so how do you crop the image you just select a sub rectangle in the main image of the sub image that you are interested in the media the media subsystem allows you to expose an API that allows user space to select a sub rectangle on all the pads but we are going to see that we could allow user space to select this sub rectangle under the final node there but we are going to see that we have some problems because this driver allows cropping the image from the sensor to the processing in the ISP and it also allows cropping the image just before the resizer so it can shrink or zoom a specific image and exposing this crop only once in the final video node would be confusing for user space since it wouldn't know which crop being the driver is using so instead of exposing the crop only on the final node and allow auto configuration we expose the API of selecting a sub rectangle on the topology in specific pads to make it clear to user space where this cropping is taking place so in this example we allow user space to select the sub rectangle on pad 0 of the ISP to select which part of the image from the sensor it should work with and also in pad 0 from the resizer so you can select which parts should zoom or shrink now about the image stabilizer how it works so usually we have the main image the image stabilizer just selects the sub rectangle in the main image the idea is that only the outer rectangle would shake and not the inner one so if you shake your phone the image shouldn't appear that shakes that much just to relate a bit when you open your phone your camera and you select the video mode you can notice that the angle of the image shrinks a bit, it seems that it zoomed a little bit it means that it's just showing you the sub rectangle it's not showing you the outer part so it can have some space to work with that means we need to allow user space to select another sub rectangle and we expose this one in the pad 2 of the ISP so we have three points we select sub rectangle then exposing this just in a single point and allowing auto configuration doesn't make much sense that's why in this case we have manual configuration propagation I want to talk a little bit about some design choices this was the original topology of the driver when I started working with it we had this block exposed, so rock chip SY MIPDIFI and we removed it and I'm going to explain you why so this block represents the MIPDIFI bus in the manual configuration propagation as we saw if we have more sub devices more complex for user space so we need to rethink a little bit more the five blocks we want to expose and the five block there doesn't expose any image configuration that just represents a connection point ideally in the topology the user for information is just the image processing steps and also the same processing steps could be used in different buses the ASP1 it supports the parallel bus it's not implemented yet but could support and the MIPDIFI CSI2 so the idea is that one doesn't provide much information so if we come back to the original topology if we want to add support for the parallel bus we would need to update topology dynamically depending which bus you are using replacing that block with the parallel block which would be confusing for user space that needs to perform all the manual configuration or we would need to expose another block to represent the parallel bus and another more block more complex for user space that's why we decided to remove it some lessons learned if you have, so I see this and some other drivers that people post upstream the code from the bus is usually integrated with the code of the ASP so if you can separate those in two different drivers one inside the media subsystem and the other one to the file abstraction layer subsystem under driver slash file and with these you will have a more generic topology for any bus it is less complex for user space and the ASP driver itself becomes much cleaner since you separate those two drivers and the file driver can be used for other protocols as well for DSI for instance if you can have both protocols under the same line so some more lessons learned not only in the perspective in the technical perspective but I also want to talk a little bit about the community perspective the video for Linux community is very open to accept drivers in staging with the condition that you work on it to move it out as soon as possible you also need to have the tail to the list to understand what is missing from moving it from staging the advantage is that it makes available to other people to use and also improves your workflow it's much easier to get contributions from others if it is already upstream somewhere otherwise people will need to send you patches directly to you to integrate on your patch set every time and repost it every time so it's much easier if it's already upstream also it makes easier for people to test to send you bug reports and it also decreases the maintenance cost since you don't need to keep rebasing all the time and you can work on it step by step with also the collaboration from the community who can follow all the progressions so just still about the staging I really recommend you to do that with the condition that you work to remove it as soon as possible so some more lessons learned don't be afraid to reorganize the code it's important to maintain the driver make sure that you feel comfortable with it it has your way so don't be afraid to change the files namings, the code order rewrite functions also I would recommend splitting the code between different files per implementation nodes or per block some cases this is not entirely possible because we reuse a lot of code from one block to another but at least separate the code between the video nodes and the sub-device nodes since those are different they have different hooks inside the media framework and it makes much easier to reveal them if they're separate of course those are all tips take it with a grain of salt check if those applies through your case I would also separate the codes that configures the hardware from the codes that implement the video for Linux API mostly because when I am reviewing other people's code I don't really know the hardware so I wouldn't know if that you are writing to is the correct one so if they're well separated it doesn't need to be in different files it could be just in different sections inside your single file I can focus on the video for Linux implementation also I recommend removing all the code that you are not using or that you can't test so the RK ISP-1 driver also supports the RK 3288 SoC but I wasn't using it I wasn't testing it so just remove it but keep the code in a way that is extendable it's easy to put the support back also the driver, the file part it had support for all the MIPi device CSI ports and it also had support for the DSI port and I was not using this one and most of the hardware that I saw most of the boards were also using just a single port so just remove everything the code was already huge so the idea is to simplify the code but keep it extendable we also had lots of macros in the headers, a lot of headers, small headers that was not that easy to navigate several macros that were not being used so just remove everything make it smaller and easier to reveal now let's talk about this specific project the Libi Camera 1 from User Space as we can see not all features are auto-discoverable from User Space an example is the RKS P1 driver that we can select a sub rectangle for cropping or sub rectangle for the image stabilizer and in the media API there is no way for user space to know that this sub rectangle represents a cropping or the image stabilizer needs to know a little bit about the driver in the camera also the metadata data buffer structures that we have in the statistics and also in parameters are usually in some kind of format that is very specific to the to the specific driver this means that it requires user space to have specific implementations for specific drivers so this means that we would have a specific application for a specific hardware and the problem is that code is not very reusable it is very hard to test since you need a specific implementation you cannot get a generic one, a generic application and usually those applications are proprietary that's where Lib Camera comes in Lib Camera is an open source camera stack for many platforms with a core user space library it has user space drivers that has the knowledge of those specific drivers on the kernel and it also has ways for image processing algorithms it also knows how to deal with the parameters and statistics and what is nice is that Lib Camera allows you to plug your own image processing algorithms as a plug-in so you can plug your proprietary algorithm and it separates really nicely the open source part from the proprietary part equivalent of the Mesa project for the graphics world but for the camera world here is the architecture of Lib Camera that you can see under the Lib Camera docks I'm not going to pass through all those blocks I just want to mention some specific parts so down there we can see the MC and support block so this is the part responsible for talking with the kernel we can have buffer allocators and on top we have the camera device with driver specific code so we have the pipeline handlers that are responsible for configuring the whole topology and it knows which and which ones mean image stabilizer we can also have image processing algorithms that allows plugins and can be proprietary so a tip that I can give you is if you are working upstreaming an ISP I recommend you to add, push or update support for your hardware under the Lib Camera project it makes it easier to test because sometimes configuring the whole topology is very painful you get more users, more developers involved since Lib Camera is a very recent and very active project people are involved in both sides in Lib Camera and also in the kernel community so you're probably going to receive more feedback for your driver reveals guidelines design guidelines to improve the quality of your driver and you also contribute with this awesome project I foresee Lib Camera being used everywhere in the future so not only in our desktops but on our phones Chromebooks and several devices and that was it that I wanted to present to you thank you very much for watching and feel free to reach out if you have any questions any comments, my email address is elen.coeque.com thank you very much hello everyone, I'm going to go to all the questions now just as warning there are some dogs barking here in the neighborhood I hope it's not too annoying okay so let me go through the questions the camera is used for M-Trim ISP one so just before talking about Lib Camera I want to mention that memory to memory so what is memory to memory device presentation you saw those yellow blocks that were presented on the phones a memory to memory device let me switch does it sound okay now, can you hear me hello hello, want to do testing okay great okay so I was first explaining about memory to memory device so a memory to memory device is one of those yellow blocks that you can do both actually you can inject frames inside the topology and you can also get frames from it usually memory to memory device are not used for ISPs, they are used for codecs I don't think there are any ISPs that implement the memory to memory API at the moment but Lib Camera already has some support regarding for the memory to memory API but I don't think it supports any device at the moment I could be wrong, this should be checked but at least this is the information that I have so hopefully this answers the question the second question, can the two protocols CSI and CSI be implemented at the same time or are they neutral exclusive so in the blockchain hardware there is one channel that using the same people lines can do both you can choose if you configure the hardware you can choose between CSI or CSI but you cannot do both at the same time I believe that you will need to multiplex it somewhere next question are the blocks implemented in software in ISPs? so in hardware we have all those features and functionalities and we know a little bit about what's the order that gets executed and the blocks are the way that we model it with software and the blocks are how we expose these it's just a representation, it's how we expose it to user space, so user space can know about a little bit, have an idea how the image flows inside the hardware and in this pipeline so it should represent the hardware in some level I hope that this answers your question next question is why aren't the RK ISP1 internal blocks the image enhancement blocks i.e. 0, i.e. 1, i.e. 3 of the ISP why they are not represented as mida entities with the visual following devices any reason to go with parameters as an input is that they are figuring ISP with visual following controls just second, just with the message want to connect your headphone to your phone already connected so why the image enhancement blocks are not modeled as two devices so I didn't think we would gain much with it it is part of the ISP block the topology so there was no big reason to separate them and having more blocks increase complexity for user space maybe we could expose some places control to some particular image enhancement blocks, but I didn't think we would gain much doing that to be honest so next question still the same question for the second part any reasons to go with the parameters as an input train starts configuring ISP with visual following controls this was something that we discussed so right now in our topology we have received a bunch of parameters just in a buffer and the media API allows us to expose specific controllers some of those controllers are predefined such as brightness, hue and some other image configurations and you can also add custom controls you can also use these control APIs we thought about doing this this was discussed but one thing that I can see is that there is a problem with atomicity since when you send the whole buffer the driver gets the whole data and applies them at once so next frame you will have all the configurations we could use another API for example there is a request API but then it gets more complex to use a space it's debatable actually I didn't think, I thought that this was easier also we had already the camera was using the previous driver and the original one was implemented like this but I think having those buffers we have the advantage of the atomicity and we avoid any more complexity right, so next question the input parameter node appears to be the same between IPU and RTSP1 is this preferred over the different control what does the media group prefer parameters or different control to configure an ISP these I would say depends on the driver I don't think the community prefers one over the other if it's just a small subset of control I would go with control but if it's more complex than that then I believe having an input parameter node is valid but this should be better discussed with the community actually next question can you create an example of auto-configuration can you say that auto-configuration ISPs don't need the camera as far as I know, OMAP is not an example of auto-configuration you need to configure all the pads from the topology and about if we have auto-configuration we don't need the camera because I still think we need the camera because it could have constant control the driver could also have several video nodes an example is the parameters and the statistics with some form of specific to the driver so I think it needs the camera a driver, user space driver so applications can be built on top of the camera and be generic so if we foresee a future that the camera is used everywhere and we don't have in the driver any specific implementation even so I think it's valid to have support on the camera applications that run on top of the camera doesn't need to worry about it it won't need to implement to different APIs one for the camera or one talking directly to the camera next question, how to contribute to the RKIC one driver, I am interested on working on the parallel protocol thank you very much Roberto for offering help so to work on the parallel protocol first you will need some hardware that exposes bus that you can plug a sensor and play with the parallel bus so I am not aware of any hardware that exposes board that exposes this so I don't know how to direct you in this matter but to contribute to the RKIC one you can tell me an email I can give you some pointers, we have some test scripts if you have the hardware you can play with the code there is also a list to do it inside the staging folder that you can see what is missing some other things are already working progress in the discussion in the mail list help is always welcome you help with testing as well there are also some bits here and there that are not implemented you are more than welcome to help thank you very much