 Okay, hello and thank you everyone for being here. Today we're gonna talk about sensor, camera sensor driver compliance. And a little introduction, my name is Jacopo. I work as a embedded camera engineer for ADS on board. ADS on board is a software consulting company which is specialized in camera and multimedia support for Linux systems. Most of my contribution are in Vita for Linux and lib camera since the very beginning of the project. Let's start by saying what we're not gonna talk about today because software compliance could be tied to different things like license compliance when you have to interoperate software with different licenses which might not be compliant when we're with the other. Software could be compliant to load. GDPR is a good example of that. Could be compliant to a development process as well. If you have a well-defined development process and could be also compliant to some standard which usually come with some conformance suite which allows you to put a stamp on your product and you're very happy about that. We're gonna talk instead about API compliance and API could be considered a sort of a standard actually. API define an API define a specification that has to be interfaced that should be implemented by two software components in order to operate one with the other. Could be expressed in different forms like a documentation format, formal language description. You mentioned that. And could be validating in many different ways. You can have static code analysis like simply people reading your code or we have tools like linters that could validate the API compliance of your implementation. And I'm sure there are tools which I'm not aware about that could do that with some AI power stuff. The most common form of validation is runtime validation. Unit testing is usually could be also considered formal compliance testing to make sure that you are not regressing your API implementation. Fuzzing, which means injecting error condition basically or unexpected parameters in your API and see how that reacts. It's also form of compliance. As well as correctness checks. I was thinking about Vigrine which test the correctness in terms of memory allocation and deallocation. As we're talking about camera sensor driver and compliance everybody who has been developing Videfilinux drivers know or should know Vifra 2 compliance. Vifra 2 compliance is a great tool. It's part of the Videfilinux 2 utility suite which comes with many other in very useful tools like Vifra 2 control, VIF CTL, media CTL. And it test, it helps you test while developing that your driver supports the operation that it claims to support according to the capability that it exposes and also test and fast implementation in order to check that corner cases are well handled and your driver is not faulty. Vifra 2 is a great tool. It helps a lot during development. Everybody that which is developing Videfilinux drivers should be using that but is API compliance enough to guarantee interoperability between different implementation of the same API. If you have been following camera development the last four or five years you might have heard of Lib Camera. Lib Camera is a user space camera stack which aims to obstruct away the complexity of the Videfilinux interfaces and offers a unified API to application and frameworks to interpret with cameras. The reason being that camera got complex a long time ago. I'm sure you have seen the slides already. This is from 2009 and that's the user space interface of the OMAP 3 camera drivers. It's very complicated and application which were used to work with a simple pipeline and used to be portable among different platforms now are left to suffer and are meant to be ported to platform A, B and C to work to work. Lib Camera fills that gap. Lib Camera offers a unified interface to frameworks and application and obstructs away the platform details so your application is finally portable. That great it work but what happens if you have a single platform but you want to use different camera modules? That's a typical case for embedded development board. The most famous case is Raspberry Pi which has a set of different sensor but many embedded board offers the same set of selectable camera modules so you have an interface and you can swap in different modules. And of course modules have drivers which might different one from each other. So we are now having a single consumer for different implementation of sensor driver and what could possibly go wrong with that? Well, a lot of things actually. This is a little note the focus when the sensor driver I'm talking about are mostly Rob sensor for modules because that's the main target of Lib Camera and that's the sensor which are most commonly used with platform with an ISP. And the aim of the presentation is a little bit to share the pain that we have experienced by interoperating with different camera modules in the last years but also to provide to sensor driver developers and also reviewers. A list of tips or things to watch out when submitting or reviewing code which is meant to be interoperable with between different implementations. I have selected a basic set of feature basic in terms that it's what Lib Camera expects drivers to offer but it's also feature that are expected by most modern application. And for each one of those I would like to show how modules got thing drivers got things a little different one between the other. So let's start with the exposure and gain that's a concept we're very well used to it comes from the usual DSR camera it's a parameter that we're used to operate and exposure it's usually expressed as a duration it's the time that your sensor is exposed to light and it's usually expressed as a time duration microsecond or nanoseconds. Gain instead and might be digital rental gain is expressed as a scalar value it's a multiplier which is applied to all the color channels and is expressed as a scalar value. Both value could be computed by some kind of algorithm the AGC algorithm when the camera is operating auto mode or could come from the application or the user when the camera is operating manual mode. If you look at exposure that seems very simple we got a single controlling video for Linux to control that and it's typically expressed as a number of lines that should be exposed to light but it turns out that the V4L2 specification do not really specify a unit for the control. So you have some driver that use lines but some driver use fraction of lines to express the same concept and they're all technically compliant to the specification. So it's impossible to operate them generically if they got the specifications lightly different to one among the other. When it comes to analog gain it's even worse. We got three control for gains. We got digital analog and a generic gain for sensor that are not able to discriminate between analog and digital part and the control of the unit is device specifically is device specific. It's usually the actual register value that it's meant to be written to the sensor to control the gain and it's usually very poorly documented so it would require a lot of experimentation to get it right. So this is a simplified view of the solution that we had to implement a live camera to deal with that. Well, here it's what I've just described. You have an application that might be supplying exposure and now a value in manual mode or you have an AGC algorithm that computes those two values in auto mode but in both cases they are expressed as microsecond and as a scholar value and you have to apply them to different sensors. So we had to come up with a set of helpers that basically translate the analog gain value to the specific register value for its sensor while for exposure we decide to go for the simple way and simply decide that we're gonna compute that in lines. So the takeaway from this as a tip from drivers implementer is that if you want to control exposure or expose feature from your driver that allows you to control exposure please use lines as a unit. It's very unlikely that you need to control subline duration for exposure. There might be cases for that but in most cases that's not the case. When it comes to gain please use analog gain whenever possible. So at your datasheet try to find out the differences between analog and gain. Try to not use CID gain for those sensors because that confuses digital and analog gain and whenever possible provide an implementation or live camera for that in order that your driver could be operated generically and we got a source of knowledge that translate to your driver specific to your module specific register value, a generic value. Another interesting feature is the traditional HV flip. They allows you to do very simple two-dee-plane transformation like mirroring, flipping, 180 degree rotation. That seems so very simple, right? They're very simple control but they have subtle implication when it comes from row sensor, when it comes to row sensors. In fact, they could change the image format without user space actually wanting to change the image format. If we look at the, that's a simplified example of a row sensor pixel array. This is the first pixel. Sensors are usually mounted upside down to compensate for the lens inversion effect. And this is the default reading rotation. So we're reading rows then lines. And we have a buyer pattern which is the expression of the pattern of the filter, the color filter which is placed on the pixel array and if we go and read, we have a green pattern, a red pattern, then a blue pattern, and then a green pattern and that gives this buyer pattern code. But if we apply a flip, in example, the horizontal flip, we're gonna read pixeling samples in a different direction. So we're gonna eat a red pattern, then a green pattern, then a green pattern, then a blue pattern and that changes the buyer pattern of the buyer pattern produced by the sensor. Same for V flip, the same for H V flip. So it happens that if you allow flipping to happen while you're streaming, you basically change in the format of your image without actually unapplying a set format. That's something that VDF Linux has a way to notify to use a space for. There is a flag which exactly tells you that changing this control changes the content of the buffer or the memory layout. It's there, it's meant to be used for this kind of situation, but all the six driver in Mayline supports that. Now, I haven't checked exactly how many drivers are row sensor buyer drivers and support flips, but I suspect there are more than six in Mayline. So it might happens that some sensor allows you to flip while you're streaming and you're without notifying user space that the color pattern could change. Another interesting thing is rotation. We have been dealing with that like we have an upstream, very lengthy description of the device tree property like two years ago, and it was meant not to be that controversial actually. We define a property which is a device tree property. It allows you, if you're a device integrator, it allows you to specify how your camera is rotated in your device, and we have provided helpers for drivers to parse the device tree and register the value from the device tree through a read-only control. That seems very non-controversial, right? Well, it turns out, well, we already know that, that most drivers are programmed through register sequences. It means that the vendors provide you a set of register blobs and you simply apply them to, you write them to the bus and apply that configuration to the sensors. Well, most of those sequences embed HV flips enabled inside. That's because the vendor assumed that your camera's gonna be rotated upside down for the lens inversion effect that we've mentioned before. So it turns out that some driver have noticed that and got creative in order to work that around. In fact, that's been removed, I think the last really, really easy was, that's from 6.3, I guess. But some driver refused to even probe if they were not rotated 100 degree upside down because they have HV flip enabled and say I cannot operate if I'm not rotating the way that the configuration tells me I'm rotated. Also CCS, which is one of the most featureful driver we have in Mainline at the moment, had some creative way to dealing with that. Basically try to compensate for any implicit V flip and H flip that is applied by the rotation and inverse them. So if your camera is rotated and you apply a flip, basically it is inverted. And that seems very confusing and more than that, if any driver have a different implementation now to deal with this kind of things, that mean that it's very hard to operate them generically. So none of them was technically wrong. They complied with the API but they were not predictable. So that was good because we had to sit down a bit and have a discussion about how to deal with those kind of things. We were meant to write documentation as well for that but as a tip for implementers, always resist rotation with the value that comes from the DET properties. Really help for that. Do not mangle that. Do not try to compensate for that. If your driver programming sequences have HV flip enabled by default, please register the flip controls enabled by default. It's rather easy to find that out. You have a register blob. If you have a data sheet, you can look how the registered are programmed by default and if they are enabled, please register the control with the full value of one. And do not auto compensate for rotation because that user space knows better in that case. They know what is the use case. They know what the application want to do and if the driver does things like you try to outsmart user space, it might get very confusing. There is another thing which we have been dealt with in the past, which is the selection targets. We want to know the geometry of the sensors and we are kindly requesting driver but I hope the camera will make that mandatory in the future to support the few selection target. Some of them are trivial. We want to know the full pixel resize that's a property which is useful to expose to applications sometimes. We want to know what are the bounds of the pixel array. So all the readable pixel array area which includes the valid and not valid pixels like optically dark pixels or which are used for black level corrections or pixel which are shielded. We want to know if that's possible to read them. We want to know what is the default analog crop because that has the implication of the field of view of your images and more interestingly we want to know the crop rectangle with what we, it's called usually the analog crop rectangle. Three target are static. They do not change during the driver operation. So in one case we have to report the full pixel resize. This is the, it's slightly more colored. It's the crop default, the crop bound. I'm sorry and that's the crop default which specify the full field of view of your sensors. But the crop rectangle, the analog crop rectangle has implication on two things which are sensible for sensor drivers. The first one is the image field of view. The more you restrict the crop rectangle, the more less information you lose, right? And it also impacts the sensor frame rate because the larger is the portion of the pixel array that you feed to your internal processing pipeline, the larger is the timing that you require to be processed. So the lower is the frame rate. So it's very important to know in the current configuration what is your analog crop rectangle. Just as an example, this is two images with the same exact auto resolution. This is 180p if I recall correctly, but in one case that's the full field of view. So the analog crop rectangle was maximized. In the other case it was reduced so you lose a lot of information around that. And probably this mode is faster than the other one. So TGT crop, please implement that. So far we require them just to be readable, but as we want to go in a direction where we can control fully the configuration of the sensor, we would like to make them soon to be writable as well. And also to allow change in the field of view of the image that you are producing. Another interesting thing are blankings. So we got lengthy discussion in the past about how to control the frame rate of the sensor. VDF Linux provide an API which is the set get frame interval operation which is kind of a misleading API. It gives you a false sense of simplicity but it hides a lot of details. We all know that the frame duration is, that's a very simple formula. You got the total number of pixel that you put on the bus, visible and blankings which are the not active ones. And you have the pixel rate. That gives you the frame duration which of course gives you the frame rate. But it depends on a lot of different parameter, pixel rate and blankings. If we provide an API like set and get frame interval, we are losing a lot of things in the middle. And sensor might be operated differently. Might be operated different pixel rate but with the same blankings or you might want to enlarge the blankings and maintain the same pixel rate. That's something that should be controllable from user space. Drivers should not try to outsmart user space in that regard. So the total frame size depends on blanking as well. But what happens when you apply, when you apply depends on blanking and visible pixels but what happens when you apply a new format? So you change the visible sizes of the frame that you are producing. Well, some driver reset blankings to default. So that means that you apply a format and that changes the frame rate of your stream. Some driver adjusts the blanking only if they exceed the limit. So you apply a new format and you maintain the frame duration. Again, none of this is really specified in the API. Seems like an implementation detail but if you're building something on top of those things and you make assumption, you might be displeased. And another interesting things about blankings, you might have seen this pattern in many, many sense of drivers. So we know that the maximum exposure time is limited by the total size of the frame that you are producing. So if you change the blankings, it means that you might have to change the exposure time as well. This is a pattern that all drivers implement with all driver that support controllable blankings at least and that's something that should probably be better handle somewhere in the core instead of having driver do the same pattern every time. But so V blank limits exposure. And if you need to set both of them, you have to be very careful because you need to set V blank first, let the driver update the exposure limit and then set the exposure again. And you have to be careful about the order of the operation because if you do things in the wrong order, you might have surprising result. As a result of that, not long time ago, Benjamin tried to overcome this introducing a new control, which is currently in discussion. That's something that it was meant to happen a long time ago and if I got the recollection right, that's something that we start and leave camera as a discussion and then as we move to the kernel, which to me is a great thing because it means that cross-pollination between user space framework and the driver that are meant to be used by that framework produces better implementation in kernel space. There are other interesting things and then where the API might actually fall short. So we know that if this is the output frame resolution and this is the analog crop on the pixel array, we know that the same output resolution could be obtained in different ways. Could be obtained by binning, by sub-sampling or skipping frame. It might be obtained by cropping but we don't have an API currently to express that. So driver of course had to be creative and some of them actually kind of abuse the selection targets to compute the scaling factor. Some other uses the TGT crop target and the output size to compute the scaling factor. Again, none of those is actually wrong according to the API but there is no way you can control it in a reliable way from user space. So the takeaway from all of this, it's actually that writing application that works generically with multiple sense of driver, it's very hard. It's probably as hard as implementing, not as hard as porting between different platform or SUCs, but you need something that obstructs the way of that complexity because otherwise we will be implementing the same thing in your application every time. And when the API doesn't help them for many reasons because the API could be underspecified sometime for good reasons, API cannot get all the implementation detail right that drivers might get creative and the only way to overcome them is to be careful during review or do why you are implementing them. Something that I think is very important is that having a standard consumers of the kernel API, it's a guarantee that different implementation are more consistent one among the other. API compliance, even if we have tool to do so, it's not enough to guarantee interoperability. So as I've said, it requires a lot of review effort to get things consistent among different implementation and if we have a reference implementation that define the expectation, it's easier for even for driver implementers to get things, I'm not saying right, but more consistently among one among the other. Another thing which I think is relevant is that for a long time kernel API, not just immediate, have been implemented but not exercised consistently. When you introduce a new API, you are meant to provide a test for that in some test utility, but the way that is actually exercised by application is very dependent on the application that uses the implementation. Again, having a reference consumer of the implementation guarantees that the design chain, the design choices made by kernel space are actually sensible for user space as well. Going forward, like it happens for DRM and KMS, if you want to introduce a new API, you have to provide an implementation in some reference framework like Wayland or EGT or other test suites. We don't have nothing like that yet formalized for media, but going forward having a standard consumers when driver developers that want to introduce a new API are meant to provide an implementation for, I think is a guarantee that the design choices made in kernel space are actually sensible for application as well. That will be it for me. I know I'm being probably too fast, but if there are questions, otherwise we're gonna have an early lunch. Any questions, Hans? I can repeat the question if you want to fix microphones. Okay, well, it makes perfect sense. I understand that there is a lot of legacy there, and... Oh, sorry, yes. Hans' remark was that V4L2 compliance doesn't just do API testing. As I partially said, that it also does fossing and testing and implementation for correctness. And most of the things that we have mentioned in this presentation could actually be implemented as test, as part of V4L2 compliance. Is that correct, Hans? Okay, yes, I totally agree with that. And I understand that there is a lot of legacy in V4L2 compliance, and the landscape of devices that it was meant to deal with has changed during the year. So yeah, there might be space there, indeed. And V4L2 compliance is a wonderful tool, and everybody should be using that if they are upstreaming code to be different Linux. Eugene? If the compliance tool requires specific behavior, right? Shouldn't that be set in stone in the API in documentation? Compliant to the latest... Testing. Hey! Wow. Okay, let's try that again. So, the question was... Oh, you repeated that already, so don't need to do that. V4L2 compliance is for new drivers, like new sensors, and you want to make sure that they comply to the best practices. And you can't... Sometimes you can put it back into documentation, but quite often there are already older drivers that do it like that, and well, then it's part of the public API. But what you really want for new drivers, you should follow all these problems that you have discovered. And whether the compliance test would warn or fail on that, that is something you can discuss, but it should certainly output a message for if it doesn't comply. Since I have to mic anyway, I can have a follow-up question. Do we have a gold standard sensor driver that people can use as a reference if they make a new one? Yes, no. So, the most feature for one, it's indeed CCS. And the one that got the most attention, I guess, has been developed by one of the maintainer of the subsystem, so it has a lot of features. But it's also very complex. The golden standard for lib camera use cases are usually the sensor which are used by Raspberry Pi, because they put a lot of effort in the implementation, both in the sensor driver in terms of features and both in lib camera. So I would mention the IMX219, an example IMX290 probably, 296477 as well. What is the certificate? It's a review, you're right. So yeah, the REST sensor, I would mention the Raspberry Pi camera mode used in that regard. Eugene, are there questions as well, or? Yeah, I have a question. Okay. I don't want to force you. So you discovered all these kind of issues with all the controls, but have you come back to enhance the documentation on them on the website that there is for media to add some kind of use case? For example, when I read those some while ago, I don't know if you already added something to them or not, analog gain was just, this is the analog gain. So maybe we could enhance that with such use cases such that people would get used to it or obsolete some of them with some points because... An example, the blame is also on us because I mean I'm pointing those things out and one, like you said, why not correct the specification why not correct the specification in that case? Because sometimes, you know, for analog gain, there really depends on the sensor implementation. The exposure example I made, for some drivers, it's fine to expose them in sub-line units. Don't see a really need for that, but maybe the driver implementation and the use case for that. When it comes to rotation, we add like three meetings to discuss that, and we were supposed to update the documentation with that, but we didn't actually got there. So you're right, all those things should become more, should be formalized in documentation, correction in V4L2 compliance tests, and that's something, yeah, I take the blame a little bit for that, for not doing that already. That might help with getting reviews done easily because people start to drive or they send a patch and after sometimes someone like you or someone else who knows all this comes and say, okay, you're doing all the wrong, so you have to rework everything. If it's not documented like you say, it becomes tribal knowledge, right? I know that because somebody told me that, somebody told that, I know, I understand that, totally correct. Okay, thank you. Once again. Sorry, something just popped in my head. Talking about the exposure control where there's no unit defined, but you would like to use lines, I think it's perfectly fine to put it in a documentation that is strongly preferred to use lines as the units for your control unless it's absolutely not possible, something along those lines, which I don't think there's any problem putting it in there and then it's at least in the specification. Okay, yeah. Thank you. Yeah, about exposure control and lines and in general, the compliance, is it a different way for global shutter cameras to handle all that and this is a different way of compliance for global shutter cameras? I don't think it's that different when it comes to exposure, but of course, there are differences. I don't have anything that comes from top of my mind, but I'm sure there are differences and that should be specified in the API documentation, indeed. I don't know if anybody has some ideas about that or? One of the things that we don't support in video filming today when it comes to exposure time is to be able to specify both the course exposure time and fine exposure time and most times to support that, regardless of whether they're in global shutter or rolling shutter actually, because we've simply had no need for that so far, at least in people coming to us or streaming drivers and using those drivers, that's something we should do and so just the number of lines is fine for most use cases, but it's not gonna cover everything, so I'd like to extend that to at least being able to control the fine exposure and possibly, there are sensors that express the exposure time in different units as well. Do we want to convert everything to lines in the kernel to have uniform API to user space or further options that's in the other moment? So if you have good ideas on how to unify that, patches would be welcome or at least proposals and ideas so we could discuss them because that's one of the things we're missing as well is feedback from the industry on use cases that we wouldn't know about or shortcomings that we wouldn't know about and that's very valuable for us. And that's sensible for HDR, which is something which is not very well exercised at the moment as far as I can tell in the different Linux. So industry use cases are very valuable for that to validate the decisions. Any other question? Thank you then. Enjoy your lunch and thank you for being here. Thank you.