 Dobre evening, everyone. Contrary to many people, I feel almost at home in Czechia. And I will help the room host in pronouncing my name. So my name is Andrzej Piotr Aśewicz. And today I will tell you about stateless encoding in video for Linux. Given how symmetric Maxwell equations are, w 1971, ten człowiek, Leonszła, postulatował się od pierwszej kompanii do już znanej elektronicznej komponenty. Ten device nazywa się memristor. I to zmienia się resistencja, depending upon, jak mało currenti, i jak tak można być używane, by zbudować memrisy. W 2023 jest w 1971 videokodex w Linuxie, ponieważ stateless encoders are coming to Linux. I now I know what I'm talking about because I've been dealing with one. By the end of this talk, you will understand the difference between stateful and stateless, and you will be able to use stateless encoders. But before we get there, we need to talk about what stateful and stateless is. And then we will talk about the user space API you need to use to control stateless encoders. And that's the part when I will be showing you links to the kernel and the gistry matrices. We will talk about rate control, why it is important, and why you want it applied when encoding. And we will outline possible future directions for encoders. And in about 30 minutes from now, we will be finishing with questions and answers session. So let us now talk about the difference between stateful and stateless codecs. But first, let's make it clear what codecs are. A codec is either a program or a hardware device which is used to encode or decode a data stream or a signal. And this talk is about hardware codecs, which are often found in systems on Egypt. In very general meaning a codec is also a specification. And modern codecs usually only specify what constitutes a valid bitstream and how to decode it. While they say little to nothing about encoding or how to come up with a valid bitstream. This information will be important later. In video for Linux codecs are represented as memory to memory devices and to understand what it means we need to pick into ancient history of video for Linux. So some 30 years ago a predominant use case was an analog TV grabber card where the user connected antenna to the card and the card received TV signal and presented received video frames to the computer. And these devices were called capture devices because they were used to capture data from the hardware to the computer memory. Of course there also existed devices which were doing the converse. That is given frames in computer memory they were generating a signal suitable for broadcasting and those were called output devices because they were used to output data from the computer to the outside world. I And when the hardware codecs became popular they were modeled as both output and capture devices. So the output part is for transferring data to the hardware, raw frames in case of encoders. And the capture part is for transferring the processed data from the hardware to the computer memory and these are encoded frames or bitstream in case of encoders. And such a device is a memory-to-memory device and this talk is about stateless encoders of this kind. So obviously you now want to know what stateless codecs are. In state full codecs the current state of encoding or decoding is kept and maintained in the hardware. Whereas in state less codecs the current state of encoding or decoding is kept and maintained outside the hardware. This seemingly innocent distinction does have consequences. Let's for example look at different frames which are an important part of encoding or decoding state. So in virtually all modern codecs the compression is achieved in part by applying a transform to move from color domain to frequency domain and then quantizing the transform coefficients. But instead of transforming and quantizing the frame contents directly a difference between a frame and its prediction is transformed and quantized and this results in better data compression. And that's possible because both encoder and decoder generate exactly the same predictions. And the final stage is entropy coding where more frequent data is assigned shorter code words while less frequent data is assigned longer code words. In video codecs there are two strategies to predict the contents of the next portion of data. Either to refer to an already processed part of the current frame or to refer to to some other already processed frame. The first is called intra prediction and when it is applied such a frame can be decoded without referencing any other frames. So such a frame is a self-contained frame sometimes called a keyframe. The second strategy is called inter prediction and when it is applied reference frames are needed to decode such a frame and reference frames can depend on any other frames. So this can generate quite a complicated set of dependencies. Now let's think about what happens when one wants to switch from encoding one sequence to encoding another sequence in a state full encoder. So when the new sequence starts being processed the old context is gone. So when you want to resume encoding the first sequence the original context is not there. So you must restart from the last self-contained frame keep encoding until you hit the frame where you left off and keep discarding the resulting bitstream because these frames have already been encoded and you are only interested in the side effect of recreating the context and once you recreate the path of the context you can resume upending to the bitstream. This looks like a lot of effort in comparison in state less encoders the state is kept and maintained outside the hardware so switching the context is a matter of swapping several pointers. So now that you have a better idea of state full versus state less summarize the differences. So the hardware is more complex in state full codecs and so it is also more expensive. And it is less complex in state full codecs so it is cheaper. In state full codecs the dynamics in the software is bigger because you need to interact with your codec firmware and such codecs such stateful codecs are often implemented as dedicated systems running their own internal firmware and interacting with that firmware keeping up with the state of that firmware is a considerable effort. On the other hand in state less encoders have much more hardware registers to cope with. And in state less encoders switching a context and by switching a context from one sequence to another sequence is less expensive than in state full codecs. And so state less codecs also offer more flexibility because various strategies such as reference, frame selection or rate control can be changed whereas in state full codecs this is all hardcoded in the hardware. This does not necessarily mean that state full codecs are bad. Sometimes the tradeoffs might be in their favor for example when you don't need to switch between sequences and increase price of the hardware is not a problem. That said hardware vendors tends to provide state less solutions nowadays. So let us recap what we have been talking about until now. We know what codecs are we are talking about stateless hardware encoders and we know the difference between state full and stateless. So now let us talk about how to use stateless encoders VP8 in particular. In 2020 Polko Ciałkowski published H264 stateless encoding support. I tried that code this year and it works. He also had a pretty detailed talk about it in one of the ELC editions. Earlier this year I published an RFC patch series which has stateless VP8 encoding support and Benjamin Garniard published Gistumer major requests and these two seem enough of a critical mass to start bringing stateless encoding support to the kernel. I ran Polskot i developed my code using Hunter Derivative which is found in Rockchip RK3399 and both these encoders use request API. So let's first understand what requests are in the context of video for Linux. Let's think for a while about an old style output device which receives frames from computer memory and generates signal suitable for broadcasting. Let's suppose there is some control in the device not necessarily a physical knob but some hardware setting controllable from the software. The control is maybe for changing the brightness or saturation or contrast in the generated signal. If frames A and B for example have been emitted and you want to change the setting from frame C on the control needs to be changed but think what happens if we want this control applied for each frame separately then you need to wait until the previous frame completes but before the next one starts then you need to apply the setting and not change this setting until the next frame and this is exactly the problem that requests in video for Linux solve they offer you a framework to apply control settings for each frame separately and not have to worry about the above mentioned timing issues. The fundamental concept in that framework is a request object which is created per each frame and needs to be queued instead of your frame and the framework takes care of all the rest. In video codex requests are useful for associating an output buffer and you already know what output means in V4L so they are useful for associating an output buffer with asset controls to be applied when processing this particular frame and in stateless codex there is a lot of parameters to be associated with a frame. So now that you know about request API, let's see how it is used. This slide might look complicated but we will go through it step by step. The parts in green refer to the requests directly. So the first IOCTL is used to allocate a request object whose file descriptor is returned in the third parameter. Then the usual V4L2 stuff follows. Extended controls in V4L is a bit like a descriptor an actual controls array which in this case contains a single element. We need to instantiate a VPA encode param struct which will hold VPA specific parameters and will be associated with the controls array. For clarity I omit memset to zero which you often want to use before you use these structs and I omit IOCTL return values which in real life you need to inspect and handle. We associate our ext controls with the encode param struct and we declare that there is a single element. We tell the framework that it is a control ID stateless VPA encode param. We pass the pointer and size the usual V4L2A. And now comes the request API specific part where we declare to the ext control struct that we want the control applied to a request and we pass the request file descriptor and then proceed as usual with the usual IOCTL but at this moment this control has been applied to the request and then we need to enqueue the request which happens in the last IOCTL and that's all there is to the PPA specific part except rate control but we will get to that later. For the curious this is how the VPA encode struct looks like. On the one hand this maybe looks large but on the other compared to other modern codecs such as AV1 or VP9 just routine members is relatively little. If you know the VPA spec the names will look familiar to you but if you don't then be informed that these have direct counterparts in the spec. Of course these cannot be said to whatever you like there needs to be some strategy and you remember that the spec says nothing about how to build a valid bit stream it only says what a valid bit stream looks like. In the codec we are talking about the Gstreamer element we take a very straightforward approach so if you want some more clever strategy to be applied when generating this VPA bit stream then you have an opportunity to get involved in Gstreamer development. So let us recap what we have been talking about until now. We know what codecs are we are talking about stateless hardware encoders we know the difference between stateful and stateless we know how to use request API with stateless encoders and let us now talk about rate control rate control is a general process you want applied when encoding regardless of a particular codec type let's see what it is why it is important and why you want it applied given the nature of video encoding the sizes of each encoded frame may vary or generally unknown up front the general rule being that intra coded frames tend to occupy more space than inter coded frames and rate control has to do with how much space each encoded frame occupies from the point of view of a decoder as long as you don't want to alter the bit stream there is no rate control depending on the width of your communication channel bit when internet connection or your physical interface to your local storage device it will take each frame a different amount of time to travel from source to destination maybe you want to apply some buffering at the receiving end and as long as there is enough bandwidth you will be able to present the frame's life with the desired FPS rate this is very different while encoding in a moment when the size of each encoded frame can be influenced and rate control is about influencing the sizes of encoded frames so that at a desired FPS rate the resulting bit stream is within desired limits different strategies can be applied for different tradeoffs for example constant quantization parameter each frame is quantized using the same QP value so constant quantization parameter implies maintaining the same QP for each frame at the expense of varying bit rate on the other hand constant bit rate strategy assumes that frames transmitted at a fixed FPS rate generate a constant stream of data so constant bit rate implies maintaining bit rate at the expense of varying frame quality average bit rate means maintaining average bit rate over a period of time at a desired level and there are certainly other options possible the VP8 RFC allows per frame constant QP which means that the quantization parameter needs for entire frame but can be different for different frames and let's have a look at using it to specify per frame constant QP request API is used so in fact the QP should be applied to the same request VP8 encode params have been applied to and the usual procedure applies we build X control struct which points to the actual array we have controls in this case this array contains a single element we specify the control ID and its value this time there is no need to pass a pointer in size because it is a simple numerical numeric control and at the end we associate the X controls with our request and then call the arrow CTL the usual way in the G-Strummer element published by Benjamin there is a very simple algorithm implemented which looks like the P part from a PID algorithm by PID I mean the classical control theory algorithm so the P is for proportional and this very simple implementation does only the following so when the bitrate becomes too large we quantize more aggressively and when the bitrate becomes less than the limit we quantize less aggressively so again there is another opportunity for you guys to step in and help with G-Strummer development and maybe offer PID or maybe some other clever algorithm for rate control ok, so let us recap again what we have been talking about we know what codecs are we are talking about stateless hardware encoders we know the difference between stateful and stateless we know how to use request API with stateless encoders we know what rate control is and how we can apply per frame constant QP so let us now talk about possible future directions for stateless encoders in Linux you might say that today VP8 is already an old codec and indeed it is and this reason alone makes it's rather unlikely that new VP8 encoding hardware ever appears actually to the best of my knowledge Google had been giving away a design of a VP8 IP block for free so there is little incentive for people to reinvent the wheel because designing an IP block takes a lot of skill and resources that said VP8 is lingua franca in video conferencing because if everything else fails then VP8 it is so it is still relevant today and it is simple so it makes sense to start adding stateless encoding support with it you guys are lucky because I am bringing to you the very latest developments from the media summit we had only this Monday so in the Linux kernel there is this two drivers rule which means that you cannot upstream a new UAPI unless you have at least two driver users of that API but given what you have just heard we are in trouble because quite likely there is only one kind of VP8 encoding hardware so what do we do so the conclusion is that if we can reasonably make sure that indeed there is only one kind of VP8 encoding hardware available then the two drivers rule will become one driver rule but guys if you can help me making sure that it is true indeed or if you are able to prove me wrong then your help would be greatly appreciated another takeaway from the media summit is that the UAPI supports three reference frames for now it supports just one and that is because hardware we are using handles only one actually it should be it should be able to handle two but we don't know yet how but anyway the UAPI should allow the full set one more takeaway still VP8 related is where the frame header should be generated so it seems we are okay with generating this frame header in the kernel and there are two reasons for that so one reason is that VP8 in VP8 there is this notion of probabilities which are adaptively updated after each frame and they need to be put in the frame header so that the decoder can read them and these probabilities are updated using by reading hardware counters so the encoding hardware features some hardware counters which accumulate how many times a given symbol has occurred in the bitstream and because these are hardware counters the kernel seems a natural place to access some hardware and the other reason for assembling the frame header in the kernel is that these probabilities tend to be a quite large set of data so if we wanted to assemble the frame header in user space then we would have to come up with some dedicated control and pass this large set of data to user space so all in all it seems it makes sense to generate the frame header in the kernel so these three were VP8 specific and now the rate control conclusion from the media summit is not specific to any codec so the per frame constant QP is okay and it seems we want it but if we assumed that it's only user space that is allowed to do rate control then we would end up being unable to use certain hardware that does have some support for hardware assisted rate control so maybe maybe we want some we want an option for drivers for individual drivers to override whatever the user wants and use their hardware to actually do rate control and maybe maybe that's not quite clear yet maybe we also want an internal algorithm for all encoders to use but that's subject to an RFC and when an actual RFC appears a discussion can follow whether we want it or not in 2023 it would not be possible to talk about AI so a natural place to look for AI assistance in codecs is to improve encoders for example by using AI to select reference frames or to do rate control or both but another option is possible maybe so there is this experiment with text to speech synthesis which means video signal but I see no reason why it cannot be applied to video signal as well and in this text to speech synthesis experiment they are doing a clever thing so one thing that the natural way to do text to speech would be to have the waveforms generated in the first place maybe with some AI help to have nice poses to prosze nice intonation and then apply well known compression techniques but what they are doing is much more bold so based on the original text they try predicting with AI what the encoded bitstream will look like completely bypassing the waveform stage so maybe the same concept can be applied for video encoding who knows so let us recap what we have been talking about until now we know what codecs are we have been talking about stateless hardware encoders we know the difference between stateful and stateless we know how to use request api with stateless encoders we know what rate control is and how we can apply per frame constant qp and we have seen possible future directions for stateless encoders this concludes my presentation i two take away is that stateless encoders encoders are coming to linux thank you for your attention and are there any questions thank you very much we are right on time if you have questions please raise your hand thanks for the talk what about h264 and h265 encoding how does that look so for h264 paul published his series his part series and his user's place program in 2020 and to the best of my knowledge he would be willing to continue with that but is out of resources for that but anyone can pick up and continue for h265 i don't know but we need to start with something and vp8 seems easy for low hanging and there is no stateless encoders support in linux whatsoever at this moment just an observation really you are the first speaker today who hasn't mentioned rest but once daniel who is in the front row succeeds in improving v4l2 support in rest you will be able to write the new driver in rest hey, so i'm paul just wanted to follow up on the h264 question so indeed on the hand troll rock chip side we just have this kind of proof of concept that you link to but i'm actually now working on h264 encoding for the citrus driver which is the all winner encoder so it's working as of last week i i'm also working on the uapi stuff so it's definitely also coming for h264 h265 as far as i know no one is really looking into it at the moment but there's hardware there's stateless hardware around so it should come around at some point and also i wanted to comment on the vp8 double driver thing, it looks like the citrus encoder is also the all winner video engine encoder which is also able to do vp8 so it's the same block that is used for h264 so it might be the second driver that you're looking for if someone has time to kind of look into that but do you think it's truly a different piece of hardware or it's the same ip block packaged in a different chip and maybe with a different register layout so from what i can see they're using the same blocks from the h264 encoder and h264 share some concepts and everything so to me it's more like they design their own and it supports both codecs so it's not like the google one that they retrofitted because it really looks like it's using the same internal stuff and i know that the h265 encoder is definitely not the handrail one so it might be an actual different implementation okay talk to you later any more questions we don't have any questions from the virtual audience in that case thank you very much thank you for the great talk goodbye