 The goal of this presentation is to explain a little bit about what we are doing right now at the Linux media subsystem. This is for me one of the host talks that are happening right now. It is related with the way we handle cameras inside the, not only at the kernel side, but actually at the user space side. We'll be talking a little bit during this talk about what it is in a complex camera on our sense. I will be discussing a little bit about how we solved camera issues we had in the past using the Linux video for Linux and we will then talk about modern hardware that are coming together with Mbibet devices and finally how can we solve the issues on those hardware and make them work with generic applications. Basically when video following to add is a Lopped, we were focused on the traditional hardware. Those hardware have internal inside the ship set, loss of complexity, so the drivers and the support for those hardware were simply activated by just one single device node that worked pretty fine until we started cell phones where part of the complexity were sent to the kernel side, the kernel drive and to use the space in order to set up the pipelines internally at the hardware. This is basically a traditional camera. It is not very different from what you have on your notebooks. It is usually an USB camera. It could be internal inside the notebook or could be external. The thing is I can control fully control this kind of cameras with just one device node. Usually that video zero. With that device node I have full access to the camera, to the streaming parts, to the controls of the bright gains, white balance and things like that. I can do everything for a single device node. Eventually others could be exposed if the hardware, for example, has TV. It may actually expose all the device nodes, but a general rule is that just one device node is enough to control this kind of hardware. All generic applications assume this kind of model, so it is very easy for you to use this hardware. You just need to open whatever application, even your browser and your camera is ready for you to use. We call this device node-based device. This is an example of a simple camera. This is a camera actually from an imbibed hardware. This is a Chromebook snow camera. It is internally an USB camera. Basically, we basically have the camera, one processing unit inside the hardware, another unit that do some control, like white balance and things like that. Everything is controlled via video zero device node. The graph itself can be related using the media controller, but it's very simple. Applications can use it very, very easily. The second kind of hardware that started actually with the cameras from the Nokia's N9 and N9100 cell phones, they assume that you can control each single part of the pipelines inside the hardware, so it exposes a lot of sub-devices. I need first to get all those sub-devices, then I need an application that knows exactly how to set up in order to get the best resolution, the best quality of image. Without the specialized applications, this hardware won't work. We call this media controller-based device. This is an example of one of those hardware that's exactly the same chipset that was found on N9100, on the MAP3 ISP. All those yellow boxes are device nodes, and applications should open most of those in order to access the hardware and control, and it has to be expected. In this specific case, we have 17 device nodes, so it is really complex for applications to work with this kind of hardware. When we started work with camera, very soon we realized that we had trouble. The trouble on that time, I'm still talking about traditional cameras right now, the trouble we had on that time is basically when we added one camera driver, GSPCA, we've detected that several different hardware vendors that manufactures USB bridges for camera had their own proprietary formats. The reason for that is that on that time USB 1.1, the bandwidth of USB bus were really small, so the camera hardware needed to compress images, and each vendor came with its own proprietary algorithm. So if you had a generic application on that time, it would work only with the hardware that the specific developer of such application would have. So it was a real nightmare. If you want to use a webcam, you have to find a hardware that were already supported by your application, or you would need to write some code in order for that camera to work. So what we did on that time is that we wrote a library, LibVis for Linux, and inside that library we added support for all those proprietary formats. Then a single application could open whatever camera it was there. There were some glitches. In some cases, vendors started to mount the sensors upside down, so if you use just the application, you would see all the images inverted, so we had needed to add a list of quirks saying that this specific USB ID has an upside down camera, so the library does the image inversion for you. This way, we found a way of hiding from the applications the differences between different cameras. Nowadays, most of the cameras are using a standard USB format, and the possible formats are reduced, so it's now way more simpler than it used to be on that time. Not sure what happened here. I guess this didn't work fine. Just a second, please. So the goal of the library was just to add support for all those different sources of complexity, and while we were doing those kind of things, we also got rid of a video for Linux version 1 API. It was in kernel since 1997, and we've just got rid of that, and we moved all those things inside the library in order to get rid of the kernel layer that didn't work fine, by the way. Okay, so LibVid for Linux, it actually has three sets of libraries inside. The first one does image processing, the second one does the old API compatibility stuff, and finally the last one provides abstraction for applications to be used by the library. The image processing part is what actually makes all the image format conversions. From user space application, they only need to support, for example, RGB 24 bits, and that's it. If the application supports that, it can work with whatever camera you have. There are a few other formats, for example, IUV, and not a few other formats that the application could use. It can select whatever it wants, and of course, the hardware exposes everything it supports. So if the application wants to do more or knows more about different formats, it can use directly, otherwise the library will emulate for the application to work. So whatever application you have nowadays, they're using the library in order to get a set that would work for it. Basically, the conversions it does are for the camera bridges between different formats like RGB, IUV, Bayer, MPEG, MJPEG, JPEG Lightning, things like that. There are some requests for ID support for MPEG, we don't have it yet. It also handles a specific format for MPEG-based hardware on connection chipsets, and every time it provides an emulated format, a flag is sent to the application. The application knows what format the hardware supports directly and what formats are emulated by the library. So if the application has a format that is compatible with the hardware, it can use that format directly, otherwise it can use the emulated flag and let the library do its work. It may not be the best implementation, but it is there and the camera should work. That's the whole idea of this flag. Inside the library, we also have things like gamma control, outwired balancing, auto-gain, so if your sensor, your camera doesn't have those features, the library will emulate it for you, and there is also a patch pending submission that would also provide autofocus. And I said before, it also can support cameras that are mounted upside down and things like that. The whole idea here is that if the camera supports in hardware, for example doing mirroring and things like that, if you use hardware support, otherwise the library will emulate for you. The video for links one, two libraries are just meant for the new and on applications that were written before video for links two. I was pretty sure that we had gone with those applications until I started looking on it, and I discovered that one of my favorite camera applications, Cameraman, were still using those days the old video for links one API. I actually got co-maintainership of that application and already parted it to use video for links two. I'm planning to use this application also when we start working with complex cameras. It is already available in Fedora. Others issues may or may not have this new version applied. I guess most still don't, but it is just a matter of time. It was required during the conversion and the way this library works, it can work in two ways actually. You could use video for links one slash open, close and whatever, or we can set an LDP load oops, sorry, wrong button. There is an LDP load parameter that you can use when you call your application asking the library to take control of open, close IO control and those G-Libc calls. It will replace the standard G-Libc by its own implementation. That's one easy way for you to use applications like Skype and other closed source applications that don't support the library directly. Finally, the video for links two and video for links two convert encapsulates everything into a video for links two set of functions. It uses the same concept as live video for links one. Basically we have video for links two underscore open, close and map and so on. If I call using video for links two prefix, it will call the library. Otherwise it can also use LDP load to this and it will emulate for you on Skype and other closed source applications. Internally it calls live video for links convert, so all those features are available directly here when we are using those libraries. You can have generic applications using this in a way that applications don't actually need to support all formats and don't need to know if the hardware has a sense or inverted or not because the library will do that for you. It was really easy to convert existing applications to use the library because the only thing you need to do were to seek for open, close IO control, etc. and replace by the other new prefixes to those IO controls, those controls adding the video for links two underscore prefix and that's it. What's the problem with that? The problem is that we had to stick exactly with the same video for links API as before. We couldn't add new things because otherwise applications won't be recognized. So it was easy to implement but it has some drawbacks. Sorry. What are the main troubles with the approach we've took? First of all, there is a maintenance issue there. If we add new stuff at the video for links API, we need to write patches to the library in order to support for those new system calls. Second one, as we are doing emulation software emulation, it has some issues on performance. The algorithms there are supposed to be fast but they don't use special assembly instructions. It doesn't use acceleration that some hardware could be providing. So it is not as the same as if you use, for example, the GPU for doing format conversions. It is lower than that. So depending on your needs, depending on if you are running on embedded devices, this may consume more battery, this may not be performing very well. So it is a way for generic applications to work but it has some side effects in terms of performance. And most, the biggest problem right now is that it supports only traditional cameras. It was not meant to work with complex camera hardware. It was not designed for that and that was okay until then, until now. Because usually when you have an embedded hardware, you have a different kind of needs and maybe camera armor is not what you want on embedded hardware. You need something else. You are doing, for example, images on your cell phone so you have maybe needing to use two different cameras at the same time or maybe you are using a higher resolution when you click on a button. So it has different demands. So that was okay. But the thing is for vendors, the model of complex camera are very, very interesting because they only need a sensor and everything else is inside the chipset that could be the same chipset as CPU. So it is cheaper for a vendor to use a complex camera and move everything to software instead of having a dedicated hardware for handling the camera itself and converting to an USB bus. So most, all SOC chipsets use this model and Intel itself is now using this model for notebook chipsets. So there are a few models already using this kind of cameras for notebooks. So we need a solution now because otherwise generic applications won't work anymore. That I'm saying is that if you are wanting to use your browser to do a video conferencing with someone, you won't be able to do on Linux anymore if you have a newer hardware. So we need to fix this and we need to fix it real quick. JStreamer also has some issues with LibVid for Linux. The basic issue here is actually because the guy that were developing LibVid for Linux, he had some other things to do and he left the project and we don't have any active maintainer anymore for the library and as we added things that LibVid for Linux, that LibVid for Linux API, we started having trouble. I won't get into details, it doesn't really matter here but it's important to know that it has already some issues that we know and that should be fixed someday or we need to move on to something else. As I said before, several modern hardware are now based on the complex camera approach. It started in 2008 with Nokia's cell phones and nowadays we have several newer hardware using this model. The first issue started with Intel Atom ISP driver. It ended by being removed from the channel. It was measured as a station driver. Nobody had time to modify it on a really, really poor condition. It ended up being removed and it was specific for Atom which is not really a big issue because you don't use the Atom outside of LibVid hardware so mostly you don't use them in. But now the new approach from Intel Mobile is using an IP block called IPU3. We have already some top of the edge Dell notebooks using these chipsets. If you have one of those hardware like Dell Latitude 5285, your camera won't work. You need to buy another external camera using USB because the one inside is a complex camera and we don't have support for that. Neither at the channel driver nor on user space applications, we are working with Intel, Dell and Google in order to solve this problem. But basically what we want right now is a solution that works for all kinds of Linux-based using. A normal distribution on a notebook are desktop, Android and Chrome OS and that's the kind of thing that we are now working to solve. Okay, we have a trouble we need to fix. How can we do that? We have a meeting in Japan on July. We have several involved parties. It was at Google's site. Google has interested on doing those kind of things. We had people from Intel. We had people from other companies and we have discussed a lot. The conclusion we have arrived is that we should have a new library stack. We are calling it Lib Camera. The idea is that the application will talk with this library. The library internally will have handlers for example to set up the pipelines inside the hardware. So when you select a resolution it will find the best way to provide you that resolution on that frame rate. So with that characteristic that you, your software needs. And if your camera has algorithms, needs some algorithms in order to make the quality of the image better. For example improving the focus or adjusting the white balance and things like that. It will call another part of the library for those 3A algorithms. That part is usually vendor specific. So this model is meant to be open source using a generic open source license. But in this particular case we believe that we will need to run some vendor specific stuff. That's what happens right now on the systems. And the fact is that vendors don't usually open this kind of software. Yet we are planning to do in a way that if someone else wants to write his own triple way code it can do. So it will support both a vendor specific improvement software and open source software. That's the idea we are planning to do. And we will have both the camera stack and those APIs documented in a way that you can replace anytime. And applications won't need to know camera specific details. All of those will be inside the label camera approach. The idea is to use what is there on Android version 3 camera API as a start. And we will be changing it as needed in order to make it more generic and in a way that can be used not only with Intel hardware but with all hardware that would have the same requirements. We are of course starting with Intel because that's the thing that right now in front of us. And that is concerning us most because it will affect our notebooks. What's next? We will have another presentation. It will be tomorrow. Lohan will be presenting us from ideas on board. He will be presenting us another stuff that are related with complex cameras and cameras in general. It will be tomorrow at 4.15 here. I'm not sure what room but anyway it is here at this room. Okay. So it will be on this room. We will be launching tomorrow to a hot site called Lib Camera. It will have a few things related with this project. And we will be hosting the Git tree for that project on linuxv.org. It should also be available tomorrow. The initial commits of that I suspect. And we will have a lot more discussions to happen on Tuesday at linux media summit. So the idea is to get the community involved on a solution. Of course the first commits will come from Intel and Google because they have the internal specs of this hardware. But the idea is to invite everyone to collaborate together in order to have a generic support for all kinds of cameras inside the library. That's it from my side. Any questions and comments? Okay. Thank you for your time.