 So first of all, I want to thank FOSS Asia organizers for organizing the conference and in person. It has been a really weird time after a pandemic. I am Humang. I was basically a desktop engineer. Recently I've turned into an embedded developer. I have been contributing to open source since a long time now. I have been contribution across GNOME OS 3 flat pack and other immutable OSes. Recently I'm doing mostly embedded upstream Linux media and lip camera these days. So we'll first take an overview around what complex cameras actually mean. What are the challenges around in this field, complications, what we have been able to achieve and finally any questions I would be happy to answer them. Just a disclaimer, there are logos and trademarks and they belong to their respective owners. So we have no intention to take over their work. So let us begin with complex cameras. This was around 10 years ago. The cameras were simple. You really had a sensor and on the SSEU had either a CSI to receiver or a scalar which will scale your images and just give you an output to the capture node. And the application could really work with them. They were simple to configure and to develop. But around, I will not name the model of this phone if anyone can guess it. Yep, this is where the complex cameras got complex. Now from the simple, from here we are here. The complex interfacing and the basic pipeline got very complex. In the red boxes, you can see how the raw sensors are present. On the SOC front, you might have CCP2 CSI to something like AE, like auto exposure white balance and a preview node for previewing the images and other nodes for the still capture because preview nodes are generally low resolution and for the still capture, you need the highest resolution. So they were separate nodes. And for the API configuration, you had multiple loads like video seven, video four and their video three, respectively to configure and get the images out. So from the left hand side, where we had a simple camera, media graph, we went to something like this. OMAP3, Nuke 900, that's correct. Now in the recent development, you might have also come across the Raspberry Pi autofocus camera modules as well, which if you can see in the middle, the silver plate is basically a motor, a very small motor called a voice coil motor, which will move the lens to adjust the focus of the image. And something like the autofocus block will be somewhere here in the green. So it's yet another node coming up. You can see the, how the complexity of the camera and the pipelines is getting increased day by day. So what are the challenges here? From the application point of view, things were really, things can, applications can manage to certain extent, but the problem is that, sorry, the problem is that it doesn't scale. You have an application and you have an underlying hardware, the application is very much tied to the hardware. It cannot be ported to any other hardware. For example, an application written for Raspberry Pi cannot be ported to Rockchip or cannot be ported to applications like more complex camera hardware like Intel IPO3. So application where very tied to the hardware, application developers would need to know the underlying hardware and how to configure it. There are multiple nodes, as you can see, and each node has to be configured in a certain way at a certain time and a certain position in the pipeline. You cannot configure, for example, the resizer node before the CSI2 node receiver and all of that. So this makes the application development quite hard. So lip camera comes up as a solution or rather tried to be a solution, but let's see how it is filling the gap. So before we go ahead, I would like to show you an overall architecture of the lip camera, which might seem a bit complex at work, but really is a kernel where the drivers live and the underlying hardware, the kernel exposes a V4 L2 API and a meter controller API. On the top, you can see the adaptation, which is like basically the application facing APIs that the lip camera provides. The core is a lip camera helper, helper libraries and the pipeline is what where things get interesting. On the top right, you can see the pipeline handler framework, which is very specific to what the underlying hardware is. So lip camera will have pipeline handlers for Raspberry Pi, different pipeline handler for Rockchip, different pipeline handler for IPO 3 or IMAX 810 plus, whatever the underlying hardware is, it will abstract all that inside a pipeline handler. On the, in the middle, you can see in the red box, you have the IPA module, that is image processing algorithms. The image processing algorithms are the ISP blocks that are provided by the vendor. They are usually emitted on an system on chip, that is the ISOC. Lip camera provides a way in combination with the pipeline handler to configure the IPA or the ISP, that is based, that is present in the system on chip. What happens basically is if you can see in the previous diagrams, the red nodes are basically the raw sensors. These are the sensors which are, which if you cannot see and raw image directly, it needs to go through the processing blocks. So the processing blocks and the algorithms that run on, that run on the platform are encapsulated in the IP itself. The IPA can be proprietary. We have customers where they don't want to open their IP and all the basic, the business, the business is driven by the IPA module being kept proprietary. So lip camera automatically has a sandbox environment where it can plumb the proprietary IPA into itself and can run with the open source pipeline handler to get the images out. And when you have a camera, when you have written an application, lip camera can do the device animation. It can discover what kind of platform it is, the application is running on. If it is a Raspberry Pi, it will involve the Raspberry Pi handler, Raspberry Pi IPA algorithms, everything. If the same application is run on drop chip, it can detect that it's a drop chip platform and accordingly the pipeline handler algorithm and the entire pipeline is configured in that way. So to summarize, lip camera, how do we define it? Open source camera stack and framework for Linux Android and Chrome OS. What are the complications? So basically v4 l2 is everywhere and everybody seems to love it. It exists, first of all. In the upstream Linux kernel, there is no other for the media capture v4 l2 is the by default API that kernel provides. If in future kernel comes up with a different, we will capture API, lip camera would be happy to support it and we are working in that direction as well. The same API is v4 l2 is used for simple cameras, digital TV, set up boxes, et cetera. So when you have something like, there are areas of conflicts such as color spaces where some color spaces doesn't make sense for a camera, but doesn't make sense for TV or set up box streams. So there are conflicts like that, but the v4 l2 API is what we have and we have to live with it for now, but it is widely tested. v4 l2 is used already. It has great interoperability and these are the applications and the media frameworks that you can see on the screen. But not everybody loves v4 l2 because the problem is everything is configured through a node and when the camera pipeline is very complex, you have to, the application has to take care of various, various types of node and has to deal with reconfiguration and configured in a certain way which makes them quite opaque for portability reasons. Sub devices, as I said, there is ISP on the system on chip. There are multiple devices that needs to be configured. So the application developer might have questions like which subdevice I should configure first or is this the right way to do it or what is the format that needs to be, that needs to be configured for the entire media bus. Multiple nodes for a single camera device, the application developer only needs to care about that. This is a camera, this is how I configure it. So when in the absence of lip camera, he or she has to deal with multiple video nodes. For example, metadata video nodes, CSI2 receiver in the ISP itself has multiple nodes, M2M dewarper. V4L2 alone isn't enough because three, when the ISP comes in the picture, the sensors are really raw sensors. They are not RGB or YUV sensors and YUV sensors are becoming obsolete. Laptops are now using complex cameras. One example is Intel's IPU3 and IPU6. These are really complex cameras and we need support for, like the lip camera has to pitch in to have a consistent API interface to access those cameras on Linux. Embedded devices are already using complex camera. The example previously I gave you around Raspberry Pi's autofocus module. It's a complex camera. But OEM needs custom solution to manage these cameras and they, I'm not sure, Ubuntu had shipped a very proprietary camera stack for Intel IPU3 where we discussed with them like lip camera can be a solution. No portable mobile camera application. As I said, the application is very much tied to the underlying hardware when a complex camera is involved. So portability is just out of the question. But there is a new API and lip camera aims to solve it. The problem is our lip camera needs to have very good kernel support for like the sensor driver should be upstream. The ISP driver should be upstream and then only the lip camera comes in and solves the problem. We live in an age where upstreaming itself is a difficult task. There are many BSPs and BSP kernels and drivers are floating around. C application don't want to use C++. Lip camera is written in C++ but we are working on language widening as well. Is it finished? Not yet, but we do have releases. We are not guaranteeing ABA stability yet. So after all these two, after the challenges and the complications that lip camera tries to address, these are some features and developments. Lip camera has a G-streamer element. That means you can use the G-streamer element to encode, stream and mix all the other G-streamer elements up and try to get a pipeline working as you wish. So yeah, the example is for the simple camera viewer, JPEG network streamer and for the receiver as well. We have Python support. So this was first language binding that we landed officially in the lip camera repo based on that PyCamera 2, which is not a part of lip camera. It's a Raspberry Pi's work to, it's a lip camera based replacement for PyCamera, which was a Python interface for Raspberry Pi legacy camera stack and many Raspberry Pi users and hobbits are very familiar with the PyCamera. We do have an Android help implementation. The picture you see is running the Chrome OS camera app and the stream is going through lip camera hall layer. We do have V4 and L2 compatibility layer, similar thing, LD preload. We will hijack the calls from, if you have a libv4 L2, if you have application based on libv4 L2, you can just swap in and swap out lip camera as well as test applications. We do develop like for our own use and to test the entire library, we do have QCAM and CAM, these are very helpful and we encourage users to just try getting started with these to get introduction to lip camera. We do have a simple hello world for the API, which is like a very simple camera manager starts. I want to get these cameras, run, queue up request and stop and you can see what's capturing and how it is done. We do have PipePire. I'm not sure if everybody is aware of PipePire, it's a upcoming Linux multimedia stack. Last year on a PipePire hackfest, we tried running Chromium, which would grow through the XTG camera portal and it was routed to PipePire, then to lip camera, then to back to the kernel. So we already are working and it's already merged. I think the PipePire integration is already there, which has multiple like PipePire has a much broader exposure to stacks like video conferencing solutions, browsers. This is the PipePire stack where the app, the browser streamer VLC are talking to the PipePire, the PipePire for imaging can offload it starts to lip camera and lip camera will deal with the camera's side of things and for the audio, you have Bluetooth, ALSA and Pulse Audio integration, I think. So the PipePire really sits in the between for all your video and audio handling. So I've linked up block to getting an introduction to the PipePire. Snapshot is our upcoming convergent camera app where the goal of the app is to have a unified camera app that can run on both on Linux-based mobiles and distributions. It is incubated in Gino. So we're happy to see that. We do have flagpack support across flagpack support means like you have permission base like if an application tries to act. So the XTG portal web RTC now supports XTG portal and lip camera will has XTG portal support. That means web RTC can give you frames that are coming through XTG portal, then lip camera and everything can work. In the near future when the web RTC and the XTG portal have been merged, you can really access cameras from your browser. These are two, Godzilla and the carrot link for the Firefox and Chromium. Once this is done, you can, the browser will be working with lip camera for the video capture support. Another notable development is PingPeng. The developed by Raphael 2K, sorry, I don't speak. It's a folk of Sailor's Harbor camera, which is a QT based camera application developed for original Pine phone. And the application is really nice because it has manual controls like you can change anything over the exposure time brightness in the UI itself. This is a demo image testing on the Python using lip camera. Wave drive, wave drive is a container based approach like having an Android system on your regular Gino Linux system. And we tried and had integrated lip camera in Wave drive. So you have a linear image running on your Gino Linux system where it accesses the camera through lip camera. And one, I think this is the last one. Pine phone pro, we test ground for mobile Linux capture for everything. It's a Rockchip RK3399. It's a complex camera with an raw sensor and ISP and it's already supported by lip camera. I do have a demo for this, let's see. So this is demo for Pine phone pro capturing images with lip camera. You can see the initial support is there, but like the algorithms, IP algorithms are still needs to get better. I think that's enough. And last but not the least, this is a very recent development that I put in in the last minute. Plug and play Raspberry Pi USB camera. So here is Raspberry Pi's, I think the autofocus camera that I first showed you, it is plugged into a Raspberry Pi Nenon. And through UVC gadget that our team have developed, it can now get plugged into your regular laptops and can be used as a webcam. So you can see the entire, the complex camera supported by Raspberry Pi, it's using the Raspberry Pi, Pi pipeline handler and IP algorithms and then can be used just as a regular USB camera into your laptops in. So that's pretty much it. Thank you for your attention and if you have any question, I would be happy to answer them. So not yet, but we are working with devices which have multiple cameras. For example, I think the Chromebooks have multiple cameras, back camera and the front camera. The lip camera can handle multiple cameras, but it depends on how the ISP is configured. Some ISPs only supports like streaming one camera at a time. We do have platforms where multiple streaming and simultaneous use of cameras is okay. We are working on something that is called logical camera grouping. So that is like mutual exclusion of cameras and things like that. So yeah, it's still working progress, but it's much more hardware dependent what the hardware capabilities are. Stereo vision, I'm not sure. I haven't seen two cameras. Okay. I think that's a layer above lip camera that you have to build. You have to get the frames and you never stitch, stitch, stitch. Yeah. With support. Pixel binning. You do have pixel binning in the sensor itself. When you get the raw image. So that's a detail you put in in the pipeline handler itself or like the binning, how do you do the binning? Image plus, with the distance, you mean the folk. I'm not sure I'm not getting it. The depth component. Yeah. It's a control, control, like many cameras have different types of controls. What the camera sensor can support or the ISP can support, I think the depth would be one of the controls. If the platform in the sensor can support it, it can be like configured by the application itself. So you can put in, if the platform itself doesn't support the camera world won't expose that it's supporting a depth. Since they do in the Chrome OS, they have their own camera stack, which is, I think it's called Cross Camera Hall, Chrome OS Camera Hall. So they are, we do have Chrome OS working with the camera already. So that's pretty much it. Thank you. Thank you once again.