 Bonjour, bienvenue à ce talk about HVCC decoding the mainline kernel. Donc aujourd'hui je vais vous parler de la façon dont le Rookie will become a stable API for HVCC support. So what about me ? My name is Benjamin Guignard, I'm Senior Software Engineer at Collabora. I'm focusing on multimedia topics since a few years now, and you can reach me on my professional address. So in the agenda, I will talk to you about some video fallings concepts, the HVCC spec, the HVCC video fallings API, and finally HVCC API destaging, and it's what we are targeting for that. So let me introduce some video fallings concepts that we are using to perform HVCC decoding with a stateless decoder API. So first, you have to know that we have a stateful and stateless codecs, individual for Linux. Both are memory to memory codecs. So basically for the stateful video decoder, you take the whole bit stream, and you send it without any external passing to the hardware. So the CPU is more or less not involved, there is no stack in user lines, just put the type of the bit stream and send it to the hardware. For stateless video decoder, the bit stream needs to be passed, decoded and some information extracted from it before being sent to the hardware decoder. So the stateful flow is quite simple. You take the compressed data, you extract the bit stream, you use a kernel API to a stateful hardware video decoder. So basically you only have to put a type on the bit stream and send it to an hardware piece, an hardware block that will decode it. Or maybe there is some firmware involved with the hardware block, but it's more or less a black box, you send it and everything is decoded in the black box and you get a decoded frame from it. For the stateless API, the flow is quite different. You still have your compressed data where you get a bit stream, but that's involved to use a software processor to extract the data needed to perform frame decoding with the hardware, so with a stateless video hardware decoder. So the software processor is in charge to extract the data, for example, to get the slides or the frame from the bit stream to also extract the additional parameters, the list that you want to fill, the number of references, the parameters that you want to apply, everything has to be pre-passed by the software and user-run. So that means that the kernel API is quite more complex. Since you have to define more structured data and flow inside the decoder. All these must be done in parallel with the software decoder and send to the hardware to be able to perform one decoded frame. So for that, thanks to video for Linux, we have now some what they call control. So there are user-setable parameters. They could be enumerated by device. So it's quite a user-non-friendly since for each device you can know which parameters are supported or not, which controls are supported or not by the device. There is parameter with runch type and steps. So you can define what you want, what is needed for your hardware to understand what are the range of the value accepted, what are the steps in the values and you can accept it and you can put a type on it. Video for Linux aim to define control for a specific purpose and to avoid duplication. It means for example that if you want to create a brightness or control for your camera, as that already exist and you can reuse generic control and not define a specific control for your hardware. The driver could also define custom control for this very specific need. So it's a very open, but there is a really a goal in a video for Linux community to keep the things as generic as possible as common as possible to not create a fork each time and not have too much specific control for device. The other concept that we will use for HVC decoder is request API. So there is one request per buffer in video for Linux. In one request you can embed multiple controls so that allow you to do, let's say a more or less atomic decoding per frame. You can set the control for each frame, you can set the value, the parameters, extra data or metadata, you can call the analysis for each buffer. And the user space can schedule in advance several requests with different control and value. So that's very useful for video decoding when you want to be able to set specific data, metadata per frame. And this allowed to configure the pipeline on per frame basis so you don't have to flush all the pipe and that also can be used for camera if you want to set a brightness or autofocus or anything is really a frame control per frame control and that is really what we need for video decoding. So with addition of the video farming control that really help us to define an API. So some word about HVC specification that we know. So HVC stands for high efficiency video decoding it's also on the name of H2C5. As you please, there are the same concept than the previous codecs, the H2C4. But now in HVC you have DCT, so discrete continuous transform and DCS for discrete continuous transform which was not the case for H2C4. As the plug size also are more viable now with this new codecs so from up to 4 to 4 to up to 32 to 32. All these allow to have a better compression rate for the same video quality because you have more transform of encoding the data and you can more adjust the macro box size so you can have a better compression with the same quality or at least the same quality. So it's specification are available so it's define all the syntax of the items of the HVC of the value and all the decoding process in the spec. So the spec is quite huge and you have a lot of things inside to perform either decod or encoding with HVC. So now it's time to speak about video following API. So the first question is how to define the API. First we have to define the need of the hardware accelerator to perform the decoding process. So with one video following this request we will have multiple control to decode one frame and what we want is that this control and this request does not real only on one hardware block. It means that the API must be generic and not fit only to one hardware specification. So that's very important to give something generic and stable in the time. We also want to be as close as possible to the specification. In the fin naming, in the parameter type, in structure content that's important to be able to implement correctly and to easily find what is in specification and how we have implemented in the API. So to give you an example of decoding process as it's defined in the specification that is a slide decoding process with a few parts on the left you have only one parameter to decode one parameter of the ASHA VC spec. So it's quite useful. You have quite different types and conditions to give the value of one parameter in the API. So this parameter for example is a picture order code that gives you, let's say, for each frame and then you can refer to the ID to find the key frame on which frame we are using. So that's a very important parameter. But as you can see it's very difficult to do that and there is a lot of conditions so we can't do that in the kernel and we do that in the Zoran. And in Zoran we generate one parameter to the API. So all the question is how to define all the parameters in the API given what we have in the spec. And there is already quite progress on this. Since the ASHA VC API is quite useful more than 250 lines there is one pixel format for the ASHA VC obviously and there is already 7 controls et 7 associatives structure for them 50 flags and more than 100 fields and parameters that are already defined. And the API is used by 2 drivers, Cedrus and Antwo AdwareBlog from Vericecon. ASHA VC API is in staging territory because maybe the API is not yet complete and we can discuss that but the good thing is that in staging we are able to share and work together to improve it. So it's a really a good point to say ok we put it in the staging it's not maybe ready but that allow everybody to propose this patch to review and to make it evolve. Zoran Black is that the API is not stable so user space like JSTrumer, FFM Pay could be misalign for each kernel version but the developer point of view is very difficult to follow because each time the kernel is progressing your user line could be misalign and you may have a value server and not easy to deploy and you lost a lot of time to realign your user line given the kernel version as that could be really peaceful. So that is us to the question of how this stage or move it to stable the API what is involved to get it out staging we need maybe to prove its maturity so for that we use confirmance test like Fluster we expect it to have more hardware block using it and we need to get a full stack including the user space but that's more or less a chinkon and the problem is not why it's not in stable we will not always deliver a full stack and to prove that it is working what also can we do to move it to staging cover the specification that is a good point but the specification are quite huge cover all the hardware block needs that's one point but we only have two hardware if we cover the needs for those two hardware we will cover the needs for all the incoming hardware not sure so first a few words about Fluster Fluster is a Python 2 developed by Fluendo to test the confirmance from multiple codecs so for example you can test and H2C is fine it works by comparing the result of the decoding to a reference and with a section checksum of M.Define of the result it's very easy to decoder and test so very helpful to perform that so basically we have a confirmance test tool we allow us to run all the same streams and we use a well known stream and Fluster give you a score at the end of you run and you know which stream are passed or failed or are okay and that give us a way to prove that the software is not regressing and that we are also progressing in the implementation of the spec so the specification with this name from ITU is more than 600 pages of description of H2C syntax and decoding process so it's quite huge, it's quite large everything is inside the process are very well defined for decoding and so we try to keep video following API name and structure everything field and fly as close as possible to the specification it have naming of type and of structure content so that make the specific the API we can rely on the API on the ITU specification and make the link easily together and also there is quite effort in the documentation to say on each flag on each structure on what is the chapter in the specification we have 2 hardware blocks are used to decode accuracy, so sedrous and multiple version of very cool, very second on true hardware and which is embedded in multiple source so with the mainline API and the lattice patches we are able to cover all the hardware capabilities of those hardware blocks so that means that the API is quite cover the hardware needs for now so that is us to the question is it time to move it to stable to summarize why an API use it by 2 hardware blocks and multiple source an API use it by gesturing an ffmp we have a confirmance test with cluster, we are fairly compared to the ITU specification so the question is what else do we need to move it to stable do we need more hardware do we need more userland do we need to cover more ITU specification so that is the question how we can talk after in the chat and how we can give you road to move it to stable so i will take the question in the chat right after