 I have given many presentations in conferences around the world and this one today is special. It's special because during breakfast this morning I was reading the news usual and what I read this morning made me angry and sad. It made me sad that in the 21st century in a country where I have lots of friends people who are supposed to be role model for us have decided to discriminate between humans just for the sake of it. Before traveling to ELC I have long hesitated on the attitude I needed to have in face of the political turmoil of the last few months and this morning I decided that I had to build action on top of the anger and that I could just not condone the situation by staying silent. I just want to dedicate this talk to all the people around the world who fight against discriminations of all kind. If I could not join them on Monday when it will rely around the country I want to show them my support and be part of the CFPING that started marching on January 21st. So for them all I have called out all the slides in this talk in pink. Thanks. So let's get started. We're going to talk about video encoding today. Video encoding is quite a complex topic. By no means am I today going to tell you that I master everything in there. So I'm going to give you a brief presentation of what's behind video encoding and how we need to support that in Linux and especially in Linux kind of a user space. A few, well just a single rule for this presentation. If there's anything that's unclear, if you have any question you can interrupt me at any time. If there's way too many interruptions and I see that we end up getting late I'll have to ask you to hold the questions for the end of the presentation. But so far feel free to interrupt. Video encoding to start with. So what's video encoding? Well really broadly speaking from really high level view. Video encoding, that's a process of video decoding in this case first. That's a process of taking an encoded bit stream. So just a big bunch of bits. And turning that in uncompressed frames that can be processed, displayed or used in general. That's really a very high view of it. Video encoding is exactly the opposite of that. Turning the uncompressed frames into a compressed bit stream. There are many kinds of video codecs, encoders, decoders. They differ in many ways, but they all have a few things in common that I'm going to try to explain first because that's what we have based, the APIs we have developed for the codecs. So the first thing that's important to understand is the concept of a bit stream. So the encoded bit stream has a format that really depends on the kind of encoder you're using. So if you're using an H265 encoder, H265, if you're using VC1, VP8, VP9, AP1, all those codecs have a different way to format the bit stream. That means that there's very few points that are common between all those different bit streams. But one of them is that they can all be divided in packets. And we're going to see why that's important. But unfortunately for us, the packets don't always have the same size. They can have the same size depending on some of the codecs, but in general case they have variable sizes. So when you're lucky, you can easily divide the bit stream, the video stream, the compressed stream into packets. For instance, if you're streaming content from the network, usually you will have a network protocol that will packetize the data for you. So you will receive packets that are divided, and you will be able to use that information in your decoding process. That's not always the case. If you think about reading a bit stream from a file from disk, for instance, there's no packetization at the file level. So what happens is that the bit stream can encode packet boundaries. The way that's handled is also dependent on the kind of bit stream, the kind of video codec. But basically, there can be markers in the stream. There can be fields that allow the CPU, allow the system to find the boundaries between the packets and then start processing the packets independently. I'll concentrate here in this talk on the video decoding process because from a software point of view, that's the most difficult one for the APIs that we care about today. Most of the information I'm going to give you also applies exactly the same way to video encoding. I'll mention a few differences here and there, but if I don't say anything, you can assume that's the same for the encoding and decoding. So the video decoder is a complex piece of at least hardware and in many cases, especially when we're lucky, of firmware running on a dedicated microcontroller that will just schedule all the processes and control the hardware. I said when we're lucky because having a firmware, having a microcontroller means that all the processing required by the decoder that can't be handled directly by the hardware and one of them is passing the input bit stream here to find the frame boundaries and extract information from there that are not used directly by the hardware when we're lucky that's handled by the firmware inside the codec. We don't have to care from a CPU from a Linux model view. Unfortunately, we're not always lucky, as we'll see later in this talk. So very roughly speaking, broadly speaking, the video stream, I mentioned already that it contains information that can be used to detect packet boundaries and usually it will contain information that's intended for a CPU or a microcontroller to process, usually referred to as headers. And then big green parts of data that's mostly intended for hardware processing. It's important to understand that because when you don't have a microcontroller in your codec, when you just have the hardware pieces that are going to process the green parts, well, you still need to do something about the headers and about all the information contained there. That's information that will tell you, for instance, what kind of resolution you're dealing with. It's very important to know when you want to locate your buffers and when to decode your frames there. And then we'll code timestamps. The headers will also encode information about the content of the different frames, information that's needed both by the application and also by the codec itself to handle the video coding and decoding process. So looking a bit at what's inside the video decoder, it's usually going to organize roughly again this way. So we have a bit streamed at the input, as I mentioned before, and decoded frames on the output. In between, we have first a bit streamed parser. As I just explained, we need to decode that bit stream. We need to split in packets. We need to find the headers. We need to take those headers and extract information from them to be able to configure the controller. And we need to feed data to be able to configure the decoder. Sorry. We need to feed the encoded data directly to the hardware decoder and to feed information from the headers to your controlling process. One of the things that differ between the encoder and the decoder here, and it's pretty important as well, is that when you have an encoder, well, the process is probably just same but in reverse, so instead of parsing the bit stream, you're going to generate a bit stream. And once again, if you're lucky, you will have a microcontroller that's going to do that in your codec, but that's not always the case. But there's also more work involving controlling the hardware in that case, because when you have an encoder, well, you want to optimize the process so that you get the smallest possible bit stream at the end. And depending on the usage, you might want to rate control the bit stream as well. You might want to make sure that your encoded bit stream will always fit in the bandwidth you have in the network, in the result bandwidth you have for the application. And so you need to do rate control. That means gathering information from the uncompressed frames, and trying to compute parameters to apply to the encoder to make sure that you will fit in your bandwidth limitations. So the controller can be mostly simple for the decoding case. We're just going to schedule the decoder and configure the hardware registers. But in the encoder case, there's more work involved there. And work where codec vendors usually think they have an added value. So there's lots of heuristics in this rate control process. And when it's well implemented, you will make sure to have both the highest possible quality and the control bit rate. The decoded frames are opted by the decoder. Also, an input to the decoder, because we're talking about video codecs in this case, not frame-based codecs. We're encoding a video stream, so that there's usually quite little changes between different frames. And the decoder to decode the next frame from Bitstream will need to have previous decoded frames because the Bitstream will contain differences, encoded differences between frames, and that's for all frames the full content of the frame. So the decoded frames need to be fed back to the decoder. They can't just be passed to the application and be left without being handled later by the whole codec. There are two categories of video codecs. The first one is what we call the stateful codecs. That's, roughly speaking, a hardware encoder or decoder with an associated microcontroller. That's going to handle all the Bitstream passing for you, all the Bitstream generation, the Bitstream rate control. All that will be handled by the codec. And that means we have, in that case, a really simple situation. We have a piece of hardware microcontroller firmware that's a black box for Linux. That takes a Bitstream at the input. That passes the Bitstream, controls the hardware, generates output frames, and keeps track of those frames and read them back from memory when needed to complete the decoding process of the next frames or the encoding process in the other direction. That's when you're really lucky. That's the architecture of the video codecs that we used to have, I would say, up to roughly three, four years ago. We have several drivers upstream for that kind of codecs. And there's a really good presentation that was given by Kamil Debski, who was working for Samsung Poland at that time, using the video for Linux API for those stateful codecs. So I'm not going to repeat all the content of that presentation. But what's important there, well, it's obviously my MSSU today, is that the video for Linux API is the API I chose for that. Because everything you need to implement that kind of codecs is upstream in video for Linux today. Because there are user-space applications, there are user-space middleware, thinking about GStreamer in particular, but not only that supports the video for Linux codec API. So there's no reason to use anything else. Well, except if you just want to have a much harder life. I always have a kind of soft spot for OpenMax. I haven't used it personally for codecs, but I had to use it for camera support quite a few years ago. And at that time, we had an OpenMax implementation. It was a big user-space stack, two megabytes binary library that was given to us by the vendor. And that was consuming lots of system memory, crashing all the time. Until someone in the company decided to get rid of that, and replace it with a 50 kilobytes implementation with a very simple API and get rid of all the stack and the intermediate layers provided by the vendor. So I assume that on the codec side, it's pretty much the same. And really, especially given that Google systems, I'm thinking about Chrome OS and Android, moving towards video for Linux for the codecs, there's no reason today to use anything else in Linux. If you really want to support Windows, that might be a different story. So I mentioned video for Linux. I'm going to go briefly through the API. I assume that most of you have at least heard of it or heard about video codecs, otherwise I'm not sure why you would be here in the first place. But I'm still going to present the API very briefly. So video for Linux is a really broad API. It can support lots of use cases, kind of hardware, video capture for cameras, video output, video codecs. I'm going to focus on the parts of the API that are specific to the codecs today. So I'm not going to present anything outside of that scope. But remember that there's more to video for Linux than what I'm presenting here today. So video for Linux is an API that's based on a device node. You have a kernel driver on one side that expose your device node to user space and user space application. You open the device node, you use usual post-exfunctions there, and a big bunch of IOCTLs. I'm not sure how many IOCTLs we have in video for Linux exactly today, but that's over 120. Last time I checked. I don't know, Hans, if you have the number. But there's really a large number of IOCTLs. Fortunately for you, you don't have to use them all for the video codecs. So supported video codecs with that. Well, in the video for Linux API, we can split the IOCTLs in roughly five categories. So we have all the configuration of the formats of cropping, composing, scaling, all that process. We have a bunch of formats defined for the codecs today. H263, H264, VC1, VPA, VP9, a few others. We don't have support for all the existing video codecs right now in the sense that we don't have drivers upstream that for all the existing formats. Adding a new format to video for Linux is pretty simple. It's a matter of adding a new 4CC code and writing documentation for that, saying what it corresponds to. The reason why we haven't added 4CC codes then in the format and the fires for all the existing codecs on the market today is simply because we have a policy that to have a format defined in the API, you need to have a driver using it. Otherwise, we get a proliferation of a bunch of formats that vendor think they need to use and later realize that they're quite pointless whenever you get a driver for them but we have to keep them in the API and it's really annoying. We have another set of IOCTLs that allow you to control your device and change all kind of parameters. Personally, my first user video for Linux personally was on camera side so if you think about camera, you can control the exposure time of the camera, you can control the gains. For video codecs, you will be able to control your bitrate. For instance, as I mentioned, you will be able to control a bunch of encoding or decoding parameters. We have 260 standard controls last time I checked which was a few days ago. We have many custom controls and I think support again for a new kind of video codec, video format, will usually require adding control identifiers that are specific to that video encoder. There's a bunch that are generic but some are specific to each format and those need to be added. Again, the reason we don't have them now is exactly the same reason why we don't have the formats defined in the first place. We need to have drivers using them. It's pretty easy to add. We need to add a few lines of documentation to explain what it does. Then video phoenix in the API has a bunch of IOCTLs to allocate and manage buffers. We support multiple memory modules. We support buffer sharing using GMA buffs. You can share your buffers between different devices. And once you have configured your device, you have set the initial value of controls that are located in your buffers, then you can start your video, what we call the video streaming process. So streaming in video phoenix means capture, means output, means encoding, decoding, so that's starting the hardware, starting the device. That's what we call starting the stream and stopping the stream is basically stopping the device. So in between, well, you just go around and shuffle the buffers. You queue buffers to the device. For instance, in the decoder case, you're going to queue empty buffers that the decoder will need to fill with decoded frames. Then at the later point, you get ready, you get signal, that's the buffers ready, you dequeue the buffer, you consume the buffer, then you give it back to the driver when you don't need it anymore, and it goes around that way. So that's roughly what a video phoenix API is about. We have a framework in video phoenix inside the kernel called v4 to mem2mem. The purpose of that framework is to provide you with a bunch of helper functions that makes it easier to implement memory-to-memory devices. So memory-to-memory device is a device that reads framework from memory, output-strain-to-memory. That's a typical use case for Kodak, but that could be deinterlaced. That could be any kind of hardware that operates from memory to memory. So that hardware, I mentioned in previous slide that we have a queue of buffers and we cycle around them. When you have a memory-to-memory device, you need two queues of buffer. One side on the input side of the hardware and one on the output side of the hardware. The names are a bit misleading here because for historical reasons, they call output and capture. So capture is pretty easy to understand. That's when you capture video frames from the hardware, they go from the device to memory. Output is a name that has been chosen from a video output point of view, from display point of view. So it means taking frames from memory and pushing them to the device. Similarly, in case of a memory-to-memory device, the output device is what's at the inputs of the hardware. So that's definitely confusing. If we could go back in time, we would have called this display. And that would have been much easier to understand. But sorry about that. I can't go back in time. The mem-to-mem framework has an important limitation that you constrain two devices that produce one frame for every frame they consume. That's going to be the case of an image encoder, a JPEG encoder, for instance. That's going to be the case of some video codecs where you're going to give the codecs, well, not one frame, the decoder, for instance, you want to give the decoder one frame at the input. You're going to give it a bunch of packets from the bitstream. And it's going to produce one for the output. But in many cases, for the video codecs, that's not the case. You're going to feed your decoder with packets, and then the decoder will do nothing and will wait until it has enough data to start producing the first frame. Same on the encoding side. You encoder might need a few frames at the beginning of the stream to start producing the first piece of the bitstream. So the mem-to-mem framework is not always a good solution for that. That's the kind of codec you have. An interesting thing about the mem-to-mem framework is that it supports internal multiplexing of the device. You can have multiple use space applications opening the same device node multiple times in two different applications. With each file handle, you're going to get two queues of buffer on each side of the device, and it will operate completely independently. That's not too difficult to implement with a driver from JPEG codec, for instance, where you basically have to just handle maintain hardware context based on how many times you are well based on the file handles. That's not mandatory. Your driver doesn't have to support that, especially when you have a driver for complex video codec where it would be really costly to maintain all those contexts. When we can't use the mem-to-mem framework, one option is to use two video devices. So two video device nodes exposed to user space. You still have a single device. You still have a single driver, but it's going to create two video device nodes. One output device node that's the input of the device, and one capture device node that's the output of the device going back to user space application. In exactly the same way, you have two cues of buffers. But this time, each is associated with a given device node. In this case, you can't, with the API we have, multiplex the device between multiple applications. So if you need to share your device, your video encoder or decoder between multiple applications, then you're going to need to implement that in user space. For complex video encoders or decoders, if you want to implement real-time performances, most of the time the hardware will not be powerful enough anyway to encode a decode multiple streams at the same time with the same device. So that's not really an issue. So the way it works is that, well, I showed you earlier how you're supposed to use the video for the API to control your device. Now you have two devices, one on the one output device, one capture device, and both need to be handled in exactly the same way. So you're going to set controls if needed. You're going to allocate your buffers. You're going to queue buffers. Again, it's the decoder use case. We have encoded bitstream buffers and we have decoded frames here. And then you're going to start streaming on those two devices and you cycle around the buffers. There's a few pictures missing because I was really upset with LibreOffice this morning with some of the pictures from the presentation. So try to imagine the cute little penguins here and here. Hopefully it will still be understandable in the next few slides. If anyone wants to fix the bug, kudos to them. Yeah, indeed. There's quite a few ones missing. So an important thing about Codex is that when it comes to header parsing, whether you're going to feed bitstream and encode a bitstream to your decoder, you have absolutely no idea what resolution you're going to get out of that. So you can't ahead of time allocate buffers for the uncompressed frames because you don't know how big they need to be. So that means that you need to feed your bitstream to the decoder and start streaming on the output side, the compressed bitstream side. For the decoder to be able to start decoding a bitstream and say, okay, that's a full HD bitstream. Then the driver will notify you we have an ICTN video for Linux that can be used to retrieve information about the video format, the video resolution. So the hardware is going to notify you. You're going to get the resolution, allocate the buffers and start streaming on the capture side. So that's one thing that's specific to the video Codex is that you sometimes need to start passing the bitstream ahead of time. That was the sateful Codex. Any question until now? Yes? No, on the capture stream. So on the output side, you're going to start a stream on the output side. It's going to start decoding the bitstream. Then when it has enough information you will do a get format on the capture side. Well, you can call other ICTLs on the capture side if you want, but that's when you have to call before allocating the buffers because you don't know how big the buffers need to be. Yes? So the question is when using a single device node and still having two queues and an output in a capture queue and queuing the queuing buffers, do we use the same buffers for the encoded bitstream and for the uncompressed frames? No, we use two separate buffer queues. So the ICTLs in video file links that deal with buffers have an argument that tell whether they deal with the output queue or the capture queue. So that means that when you queue a buffer or dequeue a buffer, you can tell for which queue you want to do the operation. So that's two separate set of buffers. We don't share the buffers between the encoded side and the decoded side. So now going to the stateless codecs. Well, they're much more annoying in the sense that, well, there's still hardware. If there's no hardware we don't have any problem. We don't have to implement any driver and I'm done today. But the hardware is pretty simple. It's going to handle all the heavy lifting, all the complex operations that would be really costly to do in the CPU. But everything else needs to be handled on the CPU side. As I mentioned, that's bitstream passing that's controlling the decoder itself with all the parameters extracted from the bitstream headers that's bitrate control when you're encoding. So that's what we're known in the CPU. That means you're a little excited. We don't want to do that in the kernel. We're dealing here with a decoded bitstream that you've downloaded from the internet. Passing that to a kernel driver unchecked, unvalidated and expecting a kernel driver to be bug-free is a really, really bad idea. An even worse idea that I've actually seen in some implementations hopefully they haven't made it to the market is that people were concerned that the bitstream that you have here the bitstream can be encrypted. When you're watching Netflix the bitstream you're going to get from the internet is going to be encrypted. They don't want you to have access to either the decoded frames of the CPU side because you could capture them. It's really, really bad. It's kind of the end of the world in that case. But neither do they want you to access the bitstream after it has been decrypted. So encryption is usually outside of the scope of the codec itself. It happens beforehand. So that means that the bitstream that's sitting in memory somewhere can't be accessed by Linux. On the decoded side it's not too difficult. They have APIs where you can allocate memory that's not accessible by Linux based on the secure mode and the security level in which the CPU is running. But the buffers will be shared with your display controller hardware directly for instance so the CPU doesn't need to touch them anyway. But here, the CPU needs to parse the bitstream so the CPU needs to touch the decrypted content. The vendors don't want to do that in Linux. So I mentioned that I don't trust the Linux kernel to do that job because of security reasons. They have moved the code to trust zone. So you have a trusted firmware running a bitstream parser and taking unvalidated input from the internet. That means you have a guarantee that you have secure mode access to your device. It's very neat. Yeah, very useful. Actually it's very useful if you want to hack a device that's locked by the vendor. So maybe we should push them in that direction. But if you want to have the same design bitstream parsing and controller decoder act shouldn't be in the kernel. Another reason for that is that, especially on the encoder side not really in the decoder but for the encoder I mentioned we have bitury control and many vendors want to keep that close source. There's lots of heuristics that have optimized the process and not comfortable with opening that. So if it's inside a kernel driver that's obviously an issue. And that's something we need to push user space. How do we do that? Well, let's look at the interface of the real hardware because that's the interface that's the device that's going to be handled by your kernel driver. So it's at the interface that will need to be exposed to user space. How do we do that? Well, at the input of the decoder you're getting data that needs to be fed directly to the decoder. And possibly part of the header that is not meant for the controller but that needs to be sent to the hardware. Possibly other information as well. For instance mostly on the encoder side you might need to pass big data tables to control the encoding, the quantization process and to make sure that we minimize the size of the bit stream. So it is video data that needs to be passed and there's ancillary data that needs to be passed as well. One option the video for Linux API that we can use is to group that in a single multi-plane buffer that will use multiple planes for the different pieces of data that need to be passed. Those buffers are still transmitted to the decoder using the cube of the cube of IOCTL. So everything that's in capital letters here is video for Linux IOCTLs. On the other side we do decoding frames. Please imagine that you have a cube penguin here. And again we might need to extract information, ancillary information, metadata that we extract from the decoder or the encoder that need to be passed back to user space. And those can be possibly large quantities of data. In most cases data that will need to be fed back with the next frame or with a subsequent frame. So again we can use the multi-planer buffers that video for Linux are for them. When we have control parameters that need to be set, well we have control IOCTLs set and get controls that can be used to control the decoder from the controller running on the CPU. Depending on the new parameters, again if you have, with each frame you need to pass a data table that's 200 kilobytes, you don't want to pass that through the control API, it will be inefficient. So you need to pass that with the buffer. If you have a few integers that you need to set and get, the control API is there for that. We have, as I mentioned, lots of video for Linux controls that are defined especially for the codex to handle all kind of codec parameters. We can add more for the formats we don't support yet but that's the API of choice. The problem we have is that well, I mentioned that we have parameters here that are associated with the data inside a buffer but the controls that we need to set and get, we also need to associate them with either encoded or decoded buffers. The reason for that is that when you want to encode a frame you want to say I want to supply this exact set of parameters for that specific frame. The control and cube of the cube of UCTLs are not synchronized to each other in video for Linux so when you set a control, it's going to be applied immediately. When you have a queue of buffers and you queue a few buffers ahead of time to make sure that you will never run out of data to be decoded and to make sure that the decoder will always be busy, well, it's pretty difficult to know when to call the set control of UCTL to make sure it's going to be applied to the right buffer. So for that we have developed a new API in video for Linux called the request API and real high-level summary of that is that it allows you to group information data controls frames, either encoded or decoded and group that in an object called a request. So you build your request and you set your images there. The way it works is that we start by allocating the request we set controls we queue a buffer for that request and then we end up queuing the request itself. So if you disregard the first and the last step the intimidated steps, that's just two calls but you might need to set lots of control so that could be multiplied by many everything in the middle is standard video for Linux API. The way it will work is that if you don't call number one and number four all the controls will be applied immediately and when a buffer is queued it's going to be processed by the hardware at a later point when the hardware gets to that buffer in the queue. With the request if you locate a request and set controls in queue buffer for the request then those operations here won't be applied immediately. The information will just be stored inside the request object that will then be queued for processing. Again we have a queue of requests and the hardware or the device well the driver in the first place obviously will go through that queue and with each request that it processes it will apply the controls to the hardware and make sure that it applies to the right memory buffer. So that's how the request API works. The request we have decided in a meeting we had in Berlin right after ELCE last year we have decided to handle request objects from user space using file descriptors the same way that DMA buff using file descriptors to manage buffers from user space we're going to use exactly the same mechanism there's a few advantages to that one of them probably the biggest is security because the first implementation of the request API just used an integer that needed to handle that referred to request meaning that only used base application could just try to guess the ID of a request and then add other parameters to get parameters from that request which was a security issue. So now we have file descriptors when you allocate a request it's going to create a request object in kernel space with a file descriptor pointing to it if you close that file descriptor the other request is deleted and doesn't exist anymore when you have your request and when you have called standard video for Linux IUCTLs with an extra parameter to specify that request and when it's full of configuration data you can then queue that request and the request will still be referenced from your file descriptor but also from the queue of requests that are pending processing from the hardware then you have two options now you close the file descriptor immediately because you decide you don't need the request and the request is in a state where it's still referenced by the queue and when the processing of that request completes it's deleted automatically if you need to keep that request around well at some point processing will complete the request will end up in a state when it can't be reused immediately but it's still accessible from user space the reason for that is that I mentioned you can set parameters in the request or get parameters so once a frame has been processed once the request is complete you might want to get information back from the decoder from the encoder that you will need to use in the controller and the standard video for Linux IUCTLs can also be used on request to get that information back so that's why you want to keep DFD then you can close DFD in which case the request is deleted but if you want to reuse a request then there's a specific call in it that you can update when you can reuse it the reason for that is that requests are associated with a state of the device with lots of configuration parameters when you create a request it's going to duplicate either the existing state of the device or at your request it's going to copy another existing request so the other call has an option to copy an existing request or to duplicate the current state of the device and you use the request and it gets to that state completes and if you want to reuse it again if you don't think about it well it's going to contain parameters that were applicable a few frames ago but are not applicable anymore at that time so you need to explicitly say I want to reuse the request and copy parameters either the current parameters or parameters from another existing request so you have to do that explicitly and then you can loop around not allowing that and just eliminating that request will be closed and requiring you to locate new ones but that means that you could run out of FDs during a video encoding or decoding process which was an issue so we need to be able to locate a bunch of requests beforehand and reuse them so that's the pretty much all about the lifetime management of those request objects request complete you can get information out of it controls all kind of data that you can get using the video file in XAPI so the request object will store that information you can get it using standard IOCTL so to complete the previous diagram I showed have two extra operations here that you can use to get information back and then you can have queued a buffer from the request so you have queued an encoded frame and you dequeued a decoded frame on the other side so it's all standard video file in X except that all the operations are down on those new request objects on top of your controller well you have your video file in X application all that leave in user space you don't want to have the controller code inside the application because this is highly device specific it's going to use the video file in X API but it will need to know about the way the encoder or decoder works to set the right parameters to pass it the right kind of data and so you don't want to have all your video file in X applications know about all the codec vendors so that has to be split some way so how do we do that what API do we use here I mentioned this is video file in X you have existing video file in X applications that deal with the state full codec as we saw before those applications use the video file in X codec API because the controller running in user space here hides everything that's device specific above the controller you have a state full codec the state is maintained here so you want to reuse the same video file in X applications you have used before you don't want to have a new specific API here so how do we handle that fortunately we have a use space library that comes to help libvfile2 is a very simple library that is basically a wrapper around the video file in X either POSIX calls, open, close IOC TL and all of the individual video file in X IOC TL so it's going to be a wrapper around that you can either use it explicitly in your application which is recommended but if you have an application video file in X and not using libvfile explicitly you can LD preload the library and then the library will intercept all the system calls all the function calls to libc functions and then well do all processing it needs as if the application had called the library explicitly and then use the system calls to interface with the device so that's completely transparent that is not compiled for libvfile you can still use it libvfile supports a concept of plugins and that's something that becomes really important for the codex so a plugin is an external library that can intercept all those calls open, close, read, write and the IOC TL call and all the video file in X IOC TL so that plugin will be able to do all kind of processing specify required by the video encoder or video decoder so that will be hardware specific, device specific plugin that will first of all intercept the format IOC TL's so I mentioned that you can with multi-plane buffers, bundle video frames and metadata the metadata are needed for the controller inside the plugin but they're not needed by the application on top of that you don't want the application to see that so those IOC TL's will be intercepted so that the application will think it's using an H264 format but behind the scene it's going to be converted in H264 on compressed frame plus metadata so going from user space to kernel space the plugin will add support for the metadata and in the other direction it will remove it so it's going to be hidden from user space we do exactly the same thing with the buffer IOC TL's once again I apologize for the missing pictures they were still there when I checked this morning and I will make sure they will be there in the slides the latest version of the slides I upload so all those IOC TL's that deal with requesting buffers or locating them, queuing them, dequeuing them at the output of a decoder you want to get decoded frames in temporal order you want to get them in the order that they will need to be displayed the bitstream might contain the frames in a different order so that means that the decoder is going to produce frames in a non-temporal order the plugin can reorder that can maintain buffers internally buffer IDs internally dequeue the buffers in an order that's known by the plugin and then pass them back to user space through the dequeue buffer IOC TL in the temporal order expected by user space locating the buffers to locate the buffers you need to know the size, you need to know the number of planes the same way that we intercept the format IOC TL's to add the extra metadata planes in the buffer we do the same with the buffer related IOC TL's we also need to intercept controls because we have a bunch of codecs related controls defined in video for Linux that are used by applications those controls could be exposed to user space but that exists at a hardware level because the control the control process that's running on the CPU inside a plugin so when you get a set control call you want to use that value internally but not forward it to the kernel some controls might want to go through you could also have controls that are exposed by the kernel but that don't need to be exposed to user space because it's specific to your device and they need to be handled by the plugin by intercepting all that your device specific plugin can hide controls or expose new controls to user space a brief word about licensing and documentation licensing first well a video for Linux driver that's a kernel driver some people might disagree it might be just a bit of a gray area but basically the GPL especially for upstream drivers obviously from a community point of view if you want to submit a kernel driver that's a GPL code LibreFile 2 in user space that's an LGPL library so that means that you can have a closed source user space application using it no issue with that you can also have a closed source plugin for your device especially for the video encoders if you decide that you don't want to open the code of your bitrate control algorithm that could be closed in the plugin obviously the community will try to push to have open source implementations here and the raw consensus we have is that to accept a video for Linux driver for merge in the upstream kernel we want to have an open source implementation in user space that might not be the most optimized one it doesn't have to have all the heuristics and all the optimized code for bitrate control but it has to be something that's working and then people can a vendor use some other vendor can distribute vendor specific plugin here that's closed source that adds well that provides better performances but it also means that if we have an open source implementation we can in the community someone might want to develop an open source alternative and optimize the code and in the end have a full open source implementation that might even be better than the one provided by the vendor from a documentation point of view I mentioned controls I mentioned that we have metadata that can be associated with the buffers the requirements that we have if we want to define new controls they need to be documented so you need a few lines of documentation explaining what they do the metadata that you associate with you with your bit stream or uncompressed buffers when you pass them to the kernel the format has to be documented as well what's next two things first the request API that's work in progress we have had lots of discussions about what the API should look like we discussed with Google in the context of codec drivers and that's work I'm personally doing at the moment so I expect a new a new version of the patches to be submitted to round well before mid-March mid-March so that's one thing the other one is that the request API is not limited to video codecs I mentioned it can be used to associate parameters with frames and there are many other use cases for that one of them is implementation of a Android camera howl version 3 where you need to be able to set parameters when you capture frames for every single frame so the request API needs to handle that another one is performance optimization that's something I'm working on with renaissance at the moment that's a primary hardware that needs to be reconfigured with every frame potentially changing the size of the buffers using the traditional video for Linux API means that you have to process a frame stop the buffer a locate new buffers with a different size restart the stream process one frame and repeat that and that's extremely inefficient so with the request API you will be able to associate formats and resolutions with every buffer and that would allow you not to stream for every frame you process that's a very important performance optimization there any question yes I'll give you a microphone sorry you mentioned that the plugin will be able to intercept the system calls and you mentioned that open close and right and ICTL will be used but some of the applications do use select and it may have like FTs for different file descriptors like one of them is a v4l to file descriptor but others are like so how do you handle that so first of all the application indeed you select the pull system call and use the video note file descriptors for that as I mentioned when you use video for Linux in Libby file you still have a kernel driver here that exposes the video for Linux API so you should not drive it to expose it to different API and other device nodes because that's not video for Linux anymore so if you still use the video for Linux API and use the same video node then you shouldn't need to intercept the select call but there could be other reasons why you need to intercept that we don't support it at the moment but the plugin API could evolve to add support for that if needed so if there are valid use cases for that you can submit them to the main English and submit a patch after that system call as well the typical uses for select is to block for the DQ IOC so you call the select and make sure that at least one event is available and only then you call the DQ and when you call DQ you expect that it will not block so in those cases like when you are managing the decode order versus the display order you may actually want to... that's right you might want indeed to there might be a buffer variable but you don't want to pass it to the application yet so in that case indeed totally fine with adding support for intercepting those calls as well thank you any other question? you just mentioned about how V3 interface in Android so I want to comment about power frame control so there is a concept called group hold in a sensor so if you apply gain or exposure value it doesn't take effect immediately it takes effect two frames later so you might want to consider how you want to synchronize DQ and also external controls yes so that's one of the reasons why we have a queue of requests so you mentioned that indeed devices and hardware can apply the parameters a few frames later after they actually sent to the hardware so if you need a queue of requests for that you need to have at least two or three requests in the queue depending on the device so you can send the parameters ahead of time to the device so it means that your driver does not need to process requests one by one without looking at the other ones it can actually look at a whole queue and process some of the parameters in advance to make sure they're applied at the right time you're welcome any other question? one common use of video decoders is for games real time game streams so latency is going to be a really big issue and latency can suffer if you have multiple buffers how does v4l2 deal with that? so the more buffers you have obviously the more latency you have it's always a balance between making sure you don't run out of buffers and minimizing latency so that's going to be that's going to depend on your use case video filenix doesn't have anything that's specific to handle latency at the moment but if you minimize the number of buffers to exactly what you need to make sure you don't have underruns then that should not be applied from an API point of view at least and the video filenix would not introduce any extra latency we can have issues when it comes to really low latency especially in the encoding side you might want to start getting actually on the decoding side as well you might want to start getting part of the output buffer before it's fully filled like you have a decoder you might say okay I know that it's going to take some time to fill a complete decoded frame but also know that the speed of my display controller is such that if I give that buffer to the display once it's filled at like two third I know that the display will not catch up before the the decoder has time to finish the buffers you might want to do that preemptively we don't have support for that at the moment that's something we have discussed before ideas are definitely welcome and I expected something we will need to address at some point but we haven't yet thank you you're welcome I think it's going to be break time in five minutes so maybe one more question if there's one but you can always talk to me afterwards so I've got the question about the request API I think I know that it may be a sick question the question is about passing the file descriptor to the request to another process is it doable or does it make any sense to one process to prepare the whole processing and then pass the descriptor it's completely doable a file descriptor I know that we can pass the file descriptor that's not the problem, the technical does it make any sense for one process to prepare the processing and then pass the request to another process, would it be usable for codecs I wouldn't think so the API will allow you to do that in many cases I can't really think of a use case where that would be needed but if you have any use case that's totally doable I would try to minimize that because even if it's doable, passing the file descriptor takes a bit of time so don't do it just for the sake of it but if you have real use cases I'm totally fine with that and how will it look from the implementation and then pass the file descriptor to another process, all the buffers which I submitted internally in kernel they will point to the same buffer or the buffer will be copied so the request object will contains parameters, points to buffers and all that is associated with the request so if you can get hold of a request in a different process because you pass DFD then you can get hold, you can read the value of the controls, you can override them you can interact with the buffers thank you you're welcome okay, I'll let you enjoy the coffee and tea and if you have any other questions feel free to talk to me either now or later today or contact me by email so in the uploaded slides I have few pointers to resources documentation by the API the presentation I mentioned about the stateful codec API and my email address so feel free to contact me if you have any questions or the links in English of course, thank you