 Hello everyone, welcome to this talk about Lib Camera. I am Paul Elder and I am a software developer at Ideas on Board and I am one of the developers of the Lib Camera team. Today I will take you through Lib Camera's journey and show you the bright future that is ahead of us. There will even be a live demo, so stay tuned. Before we get started, let's start with the basics. What really is a camera? What is there between receiving light and software receiving an image? It all starts at the imaging sensor, which converts light to digital values. It has an array of light-sensitive photodiodes. Each of these photodiodes corresponds to a pixel, but it can only detect a level of light. So to detect color, each pixel gets a color filter, which only allows a light of a specific color to pass through. The filter is a range like shown. There are two greens for every one red and one blue, since the human eye is more sensitive to green. But this means that each pixel in our image is missing two colors. So to solve that, we have to do a bit of processing. We can interpolate the missing values. As shown in the diagram, for each pixel, we take the missing colors from the neighboring pixels that do have that color. This is just a simple example. In practice, for better quality, more complex interpolations would be used. Here's an actual example. In image 3, you can see that in the red part of the tulip, there are a bunch of black pixels. These are the pixels that do not have the red color filter, so they don't detect anything. In image 4, after CFA interpolation has been done, the tulip is all red, but it is pixelated, as the missing values have been interpolated from its neighboring pixels. That's not the only processing that needs to be done. Another example is lens shading. The lens of the camera is round, while the imaging sensor is square, which means that less light reaches the corners of the sensor. This makes the corners of the image dark, like in this picture. These aren't the only issues, though. There are many more that need to be counted for, such as dead pixels in the sensor that need to be ignored, or leakage currents in the photodiodes that cause a small value to be output, even though the receipt flight is zero. So cameras have to have a complex pipeline, like what you can see in the diagram, to go just from the raw captured image to a usable image. These operations are way too expensive to implement in software in real time. So they are implemented in hardware devices called image signal processors, or ISPs. In addition to the image capturing pipeline, there is also 3A, which is auto-exposure, auto-white balance, and auto-focus that we have to deal with. The luminance of the scene in front of the camera typically varies constantly. This requires adjusting the integration time and gain of the sensor accordingly to produce an image that is neither underexposed nor overexposed. The same is true for the white balance, which requires adjusting color gains based on the light of the scene, or for focus as people can move in front of the camera. Parameters that control the lens, the sensor, and the ISP need to be computed in real time, based on an analysis of the captured images. This is computationally intensive, but fortunately the ISP comes to the rescue by computing statistics, such as histograms. Based on those statistics, algorithms compute the processing parameters for the next frame and apply them to the device. The same process repeats for every frame. If you think that the three algorithms are optional, they are not. The top two pictures were taken on IPU-3 devices from the Linux Surface community before Lib Camera had image processing algorithms for IPU-3. IPU-3 is the ISP that is used in devices such as the Microsoft Surface devices. The bottom picture is after Lib Camera got image processing algorithms for IPU-3. There's clearly a pretty big difference in how usable the images are. Last but not least, the algorithms need to be calibrated and tuned for every combination of the camera sensor and optics. You can see how much complexity there is involved in getting a usable image for an application from an imaging sensor. Now, before we can discuss the future, we first need to understand the past of cameras on Linux. I'm not actually old enough to remember the old days of cameras connected to computers with composite or parallel ports. When I started using cameras on Linux, everything was a UVC camera, where the webcam is connected via USB, either internally or externally, and they were exposed with just a single video node in the file system. And you could interact with it with the Video4Linux2 or V4L2 API. You could take any application that wants a video feed from a webcam, like Jitsi on Firefox, for example, and it would access the one video node for the camera using V4L2, and it would just work. The whole complex camera pipeline that we saw earlier, all of that was implemented inside the little box that's on the other side of the USB cable. So from the application point of view, we never have to deal with any of that complexity. But then we got complex cameras. On many embedded devices, all of the components of a camera got split. We have at least an image sensor and a receiver, and sometimes even an ISP. The image sensor outputs data on a dedicated hardware interface, such as MIPI CSI2. The receiver is integrated in the SOC, and we can use it to transfer images to memory using DMA. If we also have an ISP, then we need to send images back and forth to get usable quality images. And all these components and how they interact must all be coordinated in user space, and every SOC platform has a difference. So then the media controller API and V4L2 extensions were designed and developed to expose the full features of the ISP to user space. The problem now is that all of the burden of managing the camera pipeline is placed on the applications. They now have to program the sensor and the ISP parameters, capture raw frames, pass the raw frames to the ISP, capture the statistics from the ISP, and implement the image processing algorithms. And all this is different for every device. Since different ISPs behave differently, and algorithms have to be designed around these differences. The first device that brought this issue to light was the OMAP3 camera on the Nokia N900 in 2009, but more recently it's been on devices like the Acer Chromebook Tab 10 and the Dell Latitude 7285. Other devices include the Microsoft Surface devices, HP Chromebook X2, the Rock Pie, and even the Raspberry Pi. There are advantages to adding this complexity by physically separating the image sensor from the ISP. One is that you have more space for a higher quality image sensor since the ISP is physically separate. Another is that vendors can implement more advanced image processing algorithms because they have more control over the ISP. So they can do things such as focus assistance with face recognition or advanced HDR processing. These features are advanced and can be differentiating factors between vendors, so that is a source of difficulty of compatibility with free software. In any case, complex cameras are the trend now, and it is expected to become more common in laptops going forward, so we have to deal with it. So now our video call on Firefox cannot work with a complex camera anymore because there isn't a single node that you can use to control the whole camera. So imagine you have your application, let's say OBS for example, and in the source code, under the video sources directory, you would have the V4L2 source, and let's say you also have a Windows camera source and a Mac camera source, but then you would have to add an IPv3 source if you want to support devices like the Dell Latitude, the HP Chromebook X2, and the various Microsoft surfaces. And you need to add a Rockchip ISP source to support devices like the Rockpi and the Acer Chrome tab, and a Raspberry Pi source to support the Raspberry Pi, and a VimC source to support VimC, and an IMX source to support IMX devices, and a QCom source to support QCom devices, and so on. For every device that has a new SoC that you want to support, you'd have to add a source specifically for it in each application. Don't forget, this involves the entire capture pipeline, including passing buffers between the sensor and the ISP, and configuring them, and the corresponding image processing algorithms. All of this to get good quality images. None of these are optional. And every application that wants to support all of these different devices would have to duplicate the effort. This is clearly unrealistic, and that's why none of these applications actually support any devices with complex cameras on Linux. At Nokia, they tried to solve the problem with designs based on platform-specific plugins for libv4l to v4l2 wrapper library. But then in 2011, Nokia decided to cancel its lines of Linux-based phones and switch to the Windows phone. Development of your space solutions stopped. And after that, Linux had no embedded camera stack, and the situation was never fixed in user space. Then, in 2018, after context with the industry, we started the libcamera project. Libcamera would provide an open source and complete user space camera stack with a new API designed for working specifically with cameras, as opposed to with generic video devices like v4l2 ones. There would be a feature set that at least matched the capabilities of the Android Camera API, which has more features than v4l2. Libcamera uses existing v4l2 and media controller kernel drivers and does not come with the new kernel API. So now, all the complex details of handling each device-specific pipeline and image processing could be done once. In libcamera, and all applications that use libcamera don't have to redevelop every single component of each pipeline for each device. And they even get to use an API that was designed specifically for and makes sense for cameras. One of the requirements for libcamera was that it had to be able to enumerate cameras and support using multiple cameras concurrently. Another is that it had to support multiple concurrent streams for each camera with different resolutions and formats. This way, an application can, for example, have a smaller resolution stream for preview while actually recording a higher quality stream with both streams originating from the same camera. Of course, this is only available if the camera supports it. Libcamera also had to support per-frame controls, which guarantee that control parameters are applied precisely to the frame that the application requested the parameters to be applied to. Next is image processing algorithms, which, as we have seen, are an important part of the camera pipeline. They are also some of the most protected vendor IP. We want to work together with vendors, so the image processing algorithms, or IPAs, as we call them, are separated into plugins called IPA modules, which are implemented as shared objects. This allows us to have different implementations of IPAs, and, for example, we can have both a closed-source IPA and an open-source IPA. Of course, you won't use them at the same time, but you can choose between them. We do offer open-source implementations of IPAs ourselves. This does not mean that we're forcing a closed source into this free software camera stack, however. These IPA modules do not have direct access to kernel devices. They can only communicate with the rest of the system via the pipeline handler. They receive statistics from the pipeline handler and send computed parameters back, which the pipeline handler applies to the devices. This means that closed-source IPAs cannot cheat and use undocumented device interfaces. In addition, for every platform, there is an IPA interface, which defines what functions, callbacks, and structs are available for communication between the pipeline handler and its IPA. As the pipeline handler is required to be open-source, so is this IPA interface definition. This makes it a lot easier to experiment with and to implement open-source IPAs, which was a lot more difficult before without camera stack. The camera comes with a mechanism for process isolation to run the IPA in a separate process. The platform integrator can decide that if they don't want to run untrusted binary code with full access to the system, then they can enable this isolation. Different sandboxing mechanisms can be implemented on top of this isolation. We have an IPC mechanism for communication between the pipeline handler and the isolated IPA, such that neither side knows nor cares that the IPA is isolated. The pipeline handler can call the IPA as normal, and the IPA can send data back as normal, regardless of whether the IPA is isolated or not. This facilitates development. And lastly, even though libcamera offers a native IPA, we decided that we need adaptation layers to be backward compatible with existing IPAs to ease the transition to libcamera. We have a V4L2 compatibility layer that allows most V4L2 applications to work with libcamera without requiring recompilation. This works by intercepting library calls, such as open, close, memmap, iOctl via LDP reload. It only supports features equivalent to what one would expect of a UVC camera, however, so it's still best to transition to libcamera. We have managed to do video calls on Firefox with libcamera via this compatibility layer. We also have an Android Camera Hal, or hardware abstraction layer for libcamera, which allows libcamera to be used on both Android and Chrome OS systems as they both use the Android Camera API. Here's a little diagram of how the V4L2 compatibility layer works. With LDP reload, you can intercept function calls, so I intercept the relevant system calls, and if they correspond to a V4L2 device that's used by a libcamera camera, then we do the conversion for compatibility. There's also a Gstreamer source element for libcamera called libcameraSource. For Google Summer of Code this year, we got multi-stream support in the Gstreamer element, so it works on any camera in libcamera that supports multiple concurrent streams. The Gstreamer element has been successfully tested with these, the GNOME camera application. It can also be linked to encoders or stream data over the network, though currently there are still some limitations and more development is needed. These are more like support and major frameworks rather than adaptation layers, but PipeWire has been supporting libcamera for a while now, and they're keeping up with changes to libcamera and their releases. There's also ongoing work for Python bindings as well. Those are the main features of libcamera. Now, let's move on to how you, as users, can join in on this new world of cameras. A library isn't very useful if there are no users. I will be grouping application developers into the same category. The obvious thing is that we want you to use libcamera. The question is how? Inside the libcamera repo we have two test applications, cam and qcam. They can capture frames from the camera. Yeah. As they are test applications, they clearly aren't very useful on their own. The best is to use libcamera within applications. One way you can do that right now is with the Gstreamer element. It works like any other Gstreamer video source would and even supports multiple streams. There are some features missing, so we're always looking for contributors. What we really want is to add libcamera support to applications. This is what it really means to have users of a library. Oh, this is a diagram of what you do not want to do. Here. This is what we want. We want every application to have a libcamera source. Although libcamera does not have a stable release yet, we are getting close. And I think support like this can be added primarily out of tree so that once libcamera does have its first stable release, all these applications can get libcamera support at once. Another option is to use libcamera via Gstreamer or PipeWire. If some of the advanced features of controlling cameras aren't necessary. Using libcamera through Gstreamer or PipeWire might be the easier option for close source applications as well and could allow us to help. Otherwise, it is a bit difficult to help with closed applications. They should keep in mind that Linux users on devices with complex cameras such as Microsoft Surface will not be able to use their great proprietary video conferencing software without support for libcamera. For Chromium, we have a prototype of native libcamera support. This is a screenshot of a video calling Chromium with John Michelle on the top right using a Surface Go 2 and libcamera. This is an old version of libcamera before we had image processing algorithms for the IPv3. Today, the quality is much better and you won't notice that libcamera is involved. At this point, I would like to do a live demo. Actually, this whole presentation was a live demo. Over the weekend, I added basic libcamera support to OBS, so I'm actually recording this whole presentation on OBS. Here, we can see the list of cameras in my video source called libcameraInput. You can see that these camera names, though hard to read, they match up with what libcamera would give you when you run caten. There are more human-readable names, but it was one of the corners that I cut to get the demo working on time. The V4L2 input is still available, but you can see that I'm not cheating as the available devices from this source made clearly they match up with the libcamera sources and not with the V4L2 devices that I have listed here. The names are kind of weird because we needed unique and consistent names and I measured the time it took for me to make this add this basic libcamera support to OBS and it was about 16 hours, which included the time it took to study the OBS video source API. The code is available on my GitHub, which you can get from the link in the slide. When normal users use libcamera, it's usually via some application or framework, so you want support in those. This way, normal users can use these applications on devices with complex cameras and it will work exactly as it will on devices with simple cameras. For embedded devices, make your new application use the libcamera API. It's the future, and as we've seen, it's very difficult to make the whole pipeline from scratch and we certainly don't want close source solutions, especially if an open source one is available and good. There is a guide in the libcamera source for how to write a libcamera application. Of course, you can also message us on IRC or email the mailing list. We'll be glad to help with using libcamera. The contact information will be available at the end of the presentation. One more thing, I'm not sure if this belongs in the vendor part or the user part, but I think it might interest some users. As mentioned before, there are specific interfaces for communication between the IPA impact line handler, which are specifically designed for each platform. One purpose was to restrict system access to the IPA modules, but another purpose is to make it easier to develop, experiment, adjust, or control IPAs. If you're the type of person that likes to make compilers in their free time, perhaps this would be a fun side project. The other side of a useful library is support for many platforms. That's where the vendor is coming. Let's see how you all can join our camera world. Here's what the camera stack looks like. It is not the focus of this presentation, so I won't go too much in depth. At the very bottom, we have hardware and kernel drivers. These are not part of libcamera, but they obviously need to be implemented for camera support. As long as they implement the media control API, no change is required on the kernel side to work with libcamera. Though, upstreaming kernel drivers may require changes as part of the review process with the kernel community. libcamera has driven the development of extensions in v4l2 and media controller to fulfill the needs of the platforms we work with. We have also encountered ambiguities and design deficiencies in v4l2 and worked on fixing them. The libcamera team has extensive experience with kernel development in the media subsystem, so we can also help vendors in this area if their platforms have needs that are not covered yet. libcamera is, however, a user space framework, and it is not a hostile takeover of kernel development. So we cannot help vendors to bypass the requirements of the kernel community. The core of what needs to be developed by vendors is the pipeline handler and the IPA module. The pipeline handler is required to be open source, but that should be fair, because it's the bare minimum that's required to use the camera at all. The IPA module can still be, though we don't encourage it, closed source. Though in that case, it will not be shipped with libcamera. The IPA interface is required to be designed such that the pipeline handler is what controls the IPA and not the other way around. This is to prevent the secret sauce in the IPA to be required to run the camera at all. There is extensive documentation on how to develop pipeline handlers and even IPAs, and there are examples for both as well. As usual, we are available to provide support. As can be seen in the diagram, there are also a bunch of helper classes that should ease the development of pipeline handlers, such as for IO, configuration, managing camera sensors, and managing B4L2 devices. Vendors also do not have to worry about GStreamer or Android support at all, since the adaptation layer included in libcamera already handles that. But we're on licensing because it's important. The libcamera core and the adaptation layer are licensed under LGPL. This includes the pipeline handlers, which need to be published according to the license. The IPA modules are excluded. Only publishing the code is required to comply with the LGPL, and upstreaming is not a requirement. We however strongly recommend upstreaming. The best results are achieved by working together, and forks are costly to maintain. The kernel code is, of course, covered by its own license, which is out of scope for libcamera. Note that both the pipeline handlers and IPA modules can link to third-party libraries if desired, as long as the licenses are compatible. Close source IPA modules are fully supported as discussed before, even if we would like to encourage vendors to follow the lead of Raspberry Pi and open the algorithms too. So that's how libcamera is the future of cameras on Linux and how you can join us on the bandwagon. The slide should be available from the Linux Foundation website. I hope you found this interesting, and regardless of whether a user, an application developer, or a vendor, please don't hesitate to come talk to us. The libcamera team can be contacted through our public mailing list and IRC channel. Just a heads up, we have moved from FreeNode to OFTC. You can also contact me directly by email. I am now available for questions in the conference channel.