 So, um, hey, everyone, my name is Daniel, I'm Ada. I'm with Calabra and I'll be speaking to y'all today about video codex and the virtual civilis decoder driver, which is a driver I worked, I worked at one or two years ago, if I'm not mistaken. So we're going to be talking a little bit more. I want to start by basically introducing myself first a little bit. And then we're going to talk more about video codex, what they are, why we need them in this day and age. And after we talk a little bit more about video codex about the referral to video codec API and how it works, then we can talk a little bit more about the divisible driver proper, what it is, what its purpose, what it does and how you can run it on your machine. So without further ado, let's get started. Who am I? I was a Linux kernel mentorship program and I joined Calabra in 2021. And nowadays I mostly define myself as a person who is working both in the shimmer. So we're going to talk a little bit more about what the shimmer is. The shimmer is a multimedia framework for Linux for all their platforms as well. So I'm mostly doing multimedia related software in programming and just shimmer. And I'm also involved in the interaction between just shimmer and the referral to subsystem basically. So most of my contributions are multimedia related. And I also work with other video codex APIs like the API. Most people have heard a lot about the API. I think it's the, it's the most famous video acceleration API we have all there for Linux today. So on top of referral to I also work with other other APIs like VA. So anyhow, without further ado, let's get started. So what are video codex, which is, you know, the main topic of this presentation. So video codex, they exist basically because, well, otherwise we couldn't have video like in this day and age. Why? Because video, you know, from a very long time now it's been, it's growing in resolution. So nowadays like 4k, full HD and 4k are pretty common. So it's becoming bigger and bigger and more demanding each and every year. And if we didn't have video codex, we wouldn't be able to transmit video to stream video online and to feasibly store a video in our machines. It would be too, too big for that. So, but not all video is compressed. This is one thing I want to say by the way. If you have like, let's say if you have a video game console and you're connecting that through a GMI to your television, that's not compressed video. But for all other purposes like storing or streaming through streaming services, then yes, that has to be to be compressed. Otherwise, it's simply too big. But lucky for us, video signals, they're full of exploitable redundancies. What do I mean by this? There's a whole lot of information and video signals by the very nature of video signals themselves that we can exploit in order to reduce the file size. And this is what video codex are. These pieces of code basically that can compress a raw video signal and decompress that video for presentation at a later time by capitalizing on the redundancies that are inherent to video signals. But most of the time this process is lossy. What it means is that if you have a signal, a raw video coming from let's say a sensor like a camera, and you compress that. Most of the time when you decompress that the result isn't going to be as good as the original basically. And this is the whole trade-off. You want to compress efficiently and arrive at a passable approximation that most of the time isn't going to be as good as what you had before, but it's passable. You can still use that almost as well as you could use the original signal. So as I said, the objective is to arrive at a possible approximation for a given bitrate and power envelope. Bitrate, by the way, is the amount of bits per second the encoder can put out. So again, let's go back to a camera. A camera will be sampling the environment and then translating that into a video signal. And once you get that through an encoder to produce a compressed video signal, one of the things that you can tune is how many bits per second you want that encoder to put out. And it's pretty trivial to understand that if you allow the encoder to put out more bits, then you're going to have a better result. You tend to have a better result because you're allowing it to produce more information, to have a larger file size. And if you allow the encoder to use more power, to use more computational power, then you may also have a better result. And these things, they're at odds with each other, right? You want the best quality that you can have at presumably at a acceptable bitrate. You don't want it to be too high. You want to compress as much as you can. And you don't want to use a lot of power. You want to have something that doesn't use a lot of power, especially in this day and age of mobile devices and battery power devices. And you also don't want to heat up your machine too much. So this is basically what a video codec is. The name codec is shortened from video encoder slash decoder. That's where the name codec comes from. And usually a codec will put out a spec file, which says... I'm sorry, a codec standard will put out a spec file that says, here how you should decompress this data. And I said decompress because only video decoding, so decompression is standardized. And the logic here is, you don't standardize how you encode it because you want people to innovate at the encoding side of things, right? You want people to come out with better and faster ways to do video encoding. And it's okay so long as it decodes. And then you only standardize the decoding portion. So if you go online and you look for the video codec standards out there, for instance, H264 is a very famous one, HEVC or VP9 or AV1, what you're going to come across is a spec file that says how you can decode a H264, HEVC, VP9 stream, basically. And I said that video codecs, they operate on the basis that you can compress video by exploiting redundancies. And what are these redundancies, right? And very quickly here because we don't want to waste too much time with this, but we have different types of redundancies. The first one being the most obvious one actually being spatial redundancy, which is the understanding that pixels close to one another, they tend to be similar in value. So if you take a picture, a photograph of someone, and if you look at their skin color, the pixels are going to be roughly the same color. If you look at their shirt, they're going to be roughly the same color. If you look at the background at a given particular object in the background, the pixels are going to be roughly the same color. So it's the understanding that we can compress video by understanding that pixels close to one another, they tend to be similar, so we don't need to encode that much information. We can just copy from adjacent pixels. We have also temporal compression and because video is just a sequence of frames, right? So basically you're playing a bunch of static images at a fast enough rate that your brain interprets that as being video and not a bunch of images stitched together. And temporal compression is the understanding that if you have a picture, the next picture after it, well, there are exceptions of course, but most of the time the pictures after the adjacent pictures, they're going to be similar, especially if you increase the frame rate. Let's say you're capturing video at 60 frames per second. The amount of elapsed time between the frames is going to be so little that very few things are going to have a cure in the environment. Not a whole lot is going to have changed. So this can also lead to compression. You say, well, if I'm taking a bunch of pictures and they look alike, I don't need to encode information for all of them. I can encode just one of them and say, well, you can derive the following picture from the pictures before it. And this is also a huge source of compressing video data. We have chroma subsampling. And chroma subsampling is the understanding that our eyes are more, they can perceive changes in the intensity of light, much more than it can perceive changes in the color information. So if you take a video signal, it depends on how you're encoding it, but most of the time in video we work with something called YUV. So it's three planes. The Y plane or the Luma plane is the plane containing the information on the intensity of light, kind of. And then the UV plane is what gives the picture the color, right? It's the information on the color of the image. And if you down-sample the UV plane, you do not notice as much degradation as you would if you down-sampled the Luma plane, the Y plane. So this is also a source of compression. You can say, well, we can slash half the color information in a picture. And most people, you know, actually the majority of people aren't going to be able to tell the difference anyways. And we can slash away a whole bunch of data that we would otherwise have to encode. There's also quantization. Quantization is just a fancy term for division, basically. So without getting too much into how, into the specifics of video codex, but at some point you're going to be translating the video signal into the frequency domain. And then you're going to be dividing that against a given quantization value. And the idea is, when you start quantizing these coefficients, you can have them go toward zero or be close enough to zero that you do not have to encode them, to send them, basically. And this is also another source of compression. And if you quantize, but this is lossy, right? When you start quantizing coefficients, you start losing information. So if you're too aggressive, if you use too high of a quantizer value, you start noticing degradation pretty quickly. So you have to be really careful not to quantize too aggressively. Otherwise, yes, you're going to get a smaller file, but you're also going to get quality degradation. And then there's entropy coding, which is using probability distribution to also achieve compression. And the state of the art nowadays, I think, is using AI, basically, and machine learning to detect new ways to compress and decompress video. So these are like the compression techniques that we can use to compress and decompress. And have video be actually streamable and transmitted to the internet and stored in a hard drive and so on. Okay, with all that said, my question is, can we make this faster? Why? Because we're human beings, right? We like faster stuff. We like performance. We like, especially, we like, you know, we like when we get things to go faster, consuming less power, I think. Again, especially in a world of mobile devices and battery-operated devices. So can we make this faster? And yeah, we can make it faster through hardware accelerators, basically. So by using hardware accelerators, we can implement the encoding of the decoding and hardware, and we can be more power efficient. We can have the main CPU idling for longer because the actual video encoding or decoding is going to be taking place into hardware. That will free up the main CPU for it to do other things, right? But this also comes with a few drawbacks, one of which being that it's less flexible because once you synthesize that into hardware, unless you have an FPGA, right? But most people don't have an FPGA just laying around. They're using their GPUs or it does CPUs. And once you synthesize that into hardware, there's not much that you can do. So what I mean is you're going to be limited to what was synthesized. In video codecs, they work with different profiles because y'all can imagine that the requirements for an IMAX movie theater, for instance, or live broadcasting, they're very different from people watching, let's say, YouTube or any other streaming service. So the way that we solve this is we have different profiles, different profiles that will give you different levels of quality, levels of support, and service basically for a given use case, basically. And once you synthesize that into hardware, if your particular video encoding or decoding IP only supports a given set of profiles and you want to decode a higher profile, you're obviously not going to get it. So it's a less flexible approach, obviously. And the other drawback is that, well, now you need driver support for this. Why? Because now you have hardware to drive, right? So you need driver support. You have to have a driver in order to use the hardware. And you also need an API to communicate with this driver. So again, we use APIs to communicate with the underlying driver and hardware accelerator. So enter the world of video codec APIs, because if we want to have an accelerator, we have to have a video codec driver and a video codec API to drive it. So some of the video codec APIs that we have out there, in particular on Linux. So we have VA API, which was created by Intel, primarily free UNIX-like systems, but it also works on Windows and other platforms. We have the XVA from Microsoft for Windows and for Xbox. We have proprietary APIs like NVInc and NVDeck for people with NVIDIA GPUs. We have new up-and-coming APIs like Vulkan Video, which is using Vulkan to do video decoding. And we also have video for Linux and other ones. I've just listed a few video APIs here. So why do we have so many video codec APIs? Well, because some are suitable for some platforms and some are suitable for other platforms. So if you don't have an NVIDIA card, you're not going to be using NVDeck or NVInc. If you're video decoding or encoding IP is not part of a GPU, then you're probably not going to be using VA API, which is based on the understanding, I'm sorry, which supports video codec IPs and GPUs basically. So the different video APIs, video codec APIs out there, they cater to different use cases, to different hardware, and this is why we have so many of them. And this is also a presentation about the Linux kernel, right? So how is the kernel related to everything I have just said? Because so far we have spoken about video codecs with R video codecs. We have spoken about different APIs. And how is the kernel related to all of this? And the answer is, well, some APIs out there like VA, they basically have a huge user space component. So if you're using VA API, let's say on Linux, what's happening is you have a user space application, let's say Chrome or Firefox or VLC or whatever. And then this user space application is going to be making calls into VA API. And then VA API is going to be sending these calls to a VA driver. But this VA driver will be mostly in user space. So if you have Intel hardware, this is going to be Intel media session, if I'm not mistaken. If you have an AMD GPU, the support is going to be provided by Mesa. So eventually you're going to have a huge user space component. The user space component is going to basically tell the kernel, well, send a command buffer to the kernel saying, well, I want to decode video, here's the addresses, various addresses that I have to set up in order to drive the hardware to decode video. So the user space component will be in charge of setting up everything. Basically, the kernel part will be responsible for allocating buffer objects in DRAM and scheduling jobs in the GPU for execution, for instance. But my point being, some Kodak APIs out there, they have a huge user space component and a much thinner kernel space driver. Other video Kodak APIs, like video for Linux, they have basically no user space component. So the driver, I'm sorry, the app, be it Chrome or Firefox or any other app, they're making calls directly to the kernel. And then the kernel driver has to account for that. It's bigger than what we have for the API and other Kodak APIs. So the client program will use an API to talk directly to the kernel, and then the kernel will program the hardware. And this API that we're going to be using to talk to the kernel, in this particular case, we call it a UAPI, user space API. An API the user space can use in order to tell the kernel how to set up the device and how to program the device in this particular case to encode or decode video. So this talk is about the VFRL2 Kodak APIs and divisional driver, right? And what is VFRL2? Well, it's a framework slash API for various multimedia devices, so cameras, digital television, radio tuners, and other devices. I don't really care about these devices for the purpose of this presentation. For this presentation, I am mostly interested in the support in VFRL for video Kodaks. And it turns out that it does have support for video Kodaks as well, as of recently. And I say as of somewhat recently, because I think stateful support has been there for, I don't know, maybe 10 years, I'd say, but stateless support has been there for much less time. From 2017, I think it was the first iteration for stateless support in the Linux kernel and video for Linux too. And I'm using these terms, stateful and stateless APIs, and I'm going to talk much more about what these are. So don't worry if you don't understand what these two terms mean. And before we can talk about the VFRL2 APIs and stateful and stateless APIs and what these things they mean, we have to first understand what is inside a Kodak bitstream. So let's say you're on the website and you're watching a video file, a clip, or you have downloaded a video file to your computer. What is in there, right? So the most obvious thing to be in there is the compressed video data. That's the most what everybody is expecting to be there, right? But it's not all. There's more things in a file. And the other major component that you're going to find inside a bitstream is something that we call the metadata. And why do we have metadata? Well, because with only the compressed video data, turns out that you're not able to decode it. You need an extra block of metadata to be able to drive the decoding process, to set up some state to control the decoding process, basically. So this metadata is sent together with the actual compressed data into a single package or unit, let's say, that you can then open up and use that to decode. So as I was saying inside the bitstream, we have metadata, right? That block I've just shown. So this metadata, it controls the decoding process. It can be metadata that persists between frames, meaning, hey, I've just started a stream and I want to give you some data that I want to be valid for the entire duration of the stream. Let's say you're watching a movie. Well, I want to persist data that's valid for like 15 minutes or for the entire movie. You can do this. Or another thing that you can do or that you must do actually is, well, I want to have metadata that is going to apply to a given particular frame, which is going to be, if you're familiar with some codec APIs out there, a typical metadata that applies for a lot of frames at once is like a SPS, which stands for sequence parameters, or a PPS, which stands for picture parameters, or VPS, for instance. These things, they apply to more than one frame. Or you can also have metadata that applies to a single frame. So on top of that, as I said, you have the actual compressed data that you have to decompress. Well, why are we speaking about this, right? At first it seems unrelated, but it really isn't. Because this metadata block is what defines the handling of this metadata block is what differentiates between a stateful video codec device and a stateless video codec device. And in a stateful codec device, as the name implies, what is happening is the hardware is taking the entire chunk. It's taking both the compressed data and the metadata, and it's keeping track of this metadata internally, because it has a microcontroller that is running its own firmware internally that can, A, parse the bitstream to extract what is the metadata and what is the compressed data, and B, keep track of this metadata internally within the driver itself, the device itself. And it can keep track of this metadata, and you can think of a stateful device, I think the most helpful way to think of a stateful device is like a black box. Well, why a black box? Well, because you're sending data, the stateful device is taking in the data, it's doing its own thing with its own firmware, and it's giving you decoded frames back, and you don't really have to know how or how it's doing it. So it's kind of a black box, you push data in, and you get encoded or decoded data back. In contrast, a stateless device is actually the opposite. It works like if the stateful device works like a black box, a stateless device works like a clean slate. So not only do you have to feed it the compressed data to decode, but you also have to feed it the metadata on a frame per frame basis to actually, so that it can actually decode the data for you. It's much simpler. It cannot keep track of the metadata on its own. So it exposes registers that you have to set with the values that you extract from this metadata so that it can actually decode the data. So these are the main differences between stateful. Stateful can keep the state within the device, and stateless basically can't. And with stateless devices, you have to manually program the metadata. You have to extract the metadata in user space and then tell the kernel that you want to program the device with the metadata and what the actual values for the metadata are basically. So stateless devices, they tend to be simpler because they can keep track of less things on their own. But in turn, you need more software. You need more code in order to drive them. So a stateless API is a little bit more complicated because it needs to also expose some way for you, for the programmer, to send this metadata to the kernel somehow so that the kernel can send this metadata to the device. And the way that we do this is through what we call the Koldeq API. So the whole idea is we parse the stream and software. So we're parsing it inside of, let's say, just streamer or Firefox. Actually, I think Firefox is using FFmpeg, which is another library, a very famous library in Linux. So we have a user space application or a library who's parsing the stream in user space. And then you're using the Koldeq API to actually send this data to the Linux kernel so that, again, the Linux kernel can, the driver and the kernel can program the device with the right metadata values. And Collabra has been more urgent support for the Koldeq APIs for the major standards out there. This includes H264, H265, which are very famous, VP9, which is very famous in the open source world, and the up-and-coming AV1 video Koldeq, which is basically state-of-the-art. So yeah, anyhow, let's recap what we know so far. What we have discussed. So a video Koldeq basically compressing the compressive video, we need that because raw video files are basically very large and we need some compression going. We can do it faster using less power if we use hardware acceleration. If we're using hardware acceleration, if we have a separate piece of hardware just to encode or decode video, we need to have a driver and B, we need to have an API to talk to this driver and to the kernel. This presentation is about one of these APIs, namely the VFROL2 API to drive VFROL2 decoders and encoders. For the VFROL2 APIs, we have both the stateful and stateless flavors. There are two ways that you can communicate with VFROL2 to encode and decode video. For the stateful API, we have a black box. We only send a bit stream and the device does its thing. It gives you the encoded or decoded data back. For the stateless API, the programmer has to do a little bit more work. It must also send another block of data, what we call the metadata, through the so-called Koldeq APIs so that the driver can use these metadata values to actually program the device. And Collabra has been merging support for these APIs, UAPIs into the kernel, and some of them, some of the Koldeqs that we have merged are open source like VP9 and AV1, and some aren't like H264 or HEVC. With all that said, we can finally understand what Vizel is, the Vizel driver. What is Vizel? Vizel is a virtual, stateless decoder driver. So we know what a stateless driver is. We know what a decoder is. The decode video we've talked about this previously. But the keyword here is virtual. It doesn't drive real hardware. It doesn't drive a real accelerator. But other than not driving a real driver, it presents itself to the kernel and to use the space, programs, just like any other driver would. It implements something that we call a decode loop like any other driver would. It just does not drive any real device. Daniel? What is the decode? Yes. Sorry. There is a question in the chat. Would you like to answer that now or later? Yes, no, yes. Okay. So the question is, can stateful can be said as hardware codecs? That's the question. Can you classify stateful as a hardware codecs? Well, both are driving hardware, right? So if you have a stateless codec, you're also driving hardware, unless you're using Vizel, right? Because Vizel is a virtual driver. So you can classify both stateful and stateless codecs as, you can use both of them to drive hardware. I think that answers the question. Okay. Thank you. That's the question we have. Yeah. Go ahead. Sorry. Continue now. Thank you. Yes. How many minutes do I have? You are in at about 35 minutes into it. So you still have about an hour to go, roughly 55 minutes to go. Oh, that's fine. You're doing good on time. Thank you. Of course. So going back just one slide, I said it implements a decode loop like any other codec driver. So what is a decode loop, right? So I've taken this picture from Hans Rikul, the one of the maintainers in video for Linux. So thanks, Hans, for this diagram, basically. And what we can see this diagram is the, is a decoding loop, right? The operator, we can see how a, a V for 02 codec driver operates. So the very first thing that stands out is that we have two queues. So on the left hand side, it's written in bold on top that we have something called an output queue. And on the right hand side, we have something called a capture queue. So the idea is that a user space program will run a loop. And it'll run this loop for as long as it has data to decode, right? And the loop consists of submitting, let's take a video decoder. As an example, the loop to decode video consists of submitting the bitstream data into the output queue. And eventually you submit enough data that the decodec device can start processing it. And eventually, once the codec device starts processing this bitstream data, this compressed data, it will be able to produce decoded frames in the other queue, in the capture queue. And then user space can start to dequeue the buffers in the capture queue with the, with the decoded data. And it's a loop. You queue more raw bitstream data into the output queue. Then you get more buffers with decoded data in the capture queue. And you repeat this process until you have no more data to decode. So it's a loop that's going on in the user space program. And the user space program will be talking to the kernel so that the kernel can program the hardware to do the decoding. So there is one question on the previous slide. I thought it might clarify things for people. The question is the output queue is at input confusing. That's the question. Yes, yes, yes, this is very confusing. This is a very normal question to have. It's been a major point of debate in the Viferalty community for years basically. And to answer your question, yes, it's very confusing. But, well, once, I mean, the very first thing is we can't change this terminology right now, right, because it's already in the kernel. But the way that I used to think about it and other people think about it to clarify is that if you think from from a user space point of view, you are. Okay, this is still going to sound confusing, but it's less confusing. You're going to be outputting data into buffers, compressed data into buffers, and then you're going to be capturing data from from from the kernel. Hopefully this makes sense, right. So if you think about a webcam, I think it makes more sense. A webcam is not like a Kodak device. A webcam will have only one queue, because there's only one thing that you can do with the webcam capture the video data right. You're not sending any any any compressed data, your own capturing data from from the sensor. So this is where the name originated from capture queue kind of comes from like webcam devices, for instance, we're capturing data and then output queue. It comes from the fact that you're outputting data to the device from a user space perspective. It's very confusing, and it takes a little bit of training and practice before you can not get too confused with with these two terms. So as I was saying, a visual will will the visual driver will will implement a decoding loop as well. So from from the perspective of a user space program, there's no difference. The user space program doesn't really know that it's talking to a virtual driver. All he knows is that once it starts talking to this device, it can establish a loop in which it can send data to the device. The device will process quote unquote this data, and it can get data back from the device, like any other device any other real device. So he doesn't he doesn't really know that it's a virtual device at all. And what user space also know is that it can it can use different cold x standards with this device. So it knows that if it if it's reading VP eight data and VP eight is one standard, while it can use this device to quote unquote decode it. If it's using if it's reading VP nine data if you gave your user space program a VP nine video file. It can also send it to to viso to decode if it has a TVC content AVC content impact to content. It can send it to viso. And as far as user space is concerned, viso is decoding this thing, but not really because viso the the the whole point of viso being a virtual driver is that viso doesn't really decode anything. And we're going to talk more about what it means to not decode anything, or why I'm saying decode in quotes like. But what you can use viso for and why it's important why it's useful is that you can trace a whole bunch of debug information through it. And why would you want to do it why would you want to trace anything use the virtual driver to trace anything. Well, because if you're if you're a video code developer, and you're writing software to drive a hardware accelerator be in video for Linux. You want to have some feedback on what you're doing. You, you want to have something that's going to make your life easier. And this something for now for video for Linux for us for state less decoders. This thing is viso. So, and viso you can see the state of the queues so you can see both the state of the capture queue and the output queue as I've as I've shown in the previous slides. Why is it important. Well, because you have to queue and the queue data into these two queues right. You guys will remember from like, two, three slides ago, that I said that you have to be queuing and the queuing buffers constantly into these two queues, until you have no more data to to decode. And one of the things that you can do, for instance, when you're just getting started, is you can get this process wrong. It happens. And you hopefully you can catch this mistake by using viso because viso will output for you the state of the queues. The state of the DPB and the DPB if you will remember from the beginning of the presentation, the DPB is is the is is is a bunch of buffers basically basically containing previously decoded frames there you're using as a reference to the frame that you're currently decoding. Another thing that a video called that programmer can do that will definitely throw a wrench into things and basically break the decoding process is to make a mistake when setting up the DPB. It's saying that a picture should be part of the DPB when it shouldn't or evicting a picture from the DPB a buffer from the DPB when it should be in the DPB. So if you make these mistakes, it'll break your, your device basically you start to have some garbage the device will start to put out to put out garbage, or you can crash saying that the address that you provided is invalid and you know spit out a bunch of IOM and you errors or something, and you don't want that to happen and one way that you can catch this is again by using viso. You can trace the bitstream metadata that you have submitted because as we've, as we have spoken previously, viso is a stateful, I'm sorry, a stateless driver, and by being a stateless driver, you have to use the codec qapi to send the codec metadata for viso to again quote unquote decode we've spoken previously about this when we were explaining about the stateless API and how a stateless device works. So, with viso you can dump the metadata that you have sent. And why is it important for you to dump this metadata. Well, because again one of the things that you can do is you can send wrong metadata. It can have a broken parsing in user space a parser that you thought that was correct, that you have debugged, but turns out that you were wrong, and there's a mistake a tiny little mistake somewhere. And you can highlight this error by using viso, you can you can dump this data and say, oh, I expect it to be, you know, XYZ value here but I have a different value this is wrong. And, and how can you do this this is another very important thing the viso offers you how can you know that you're supposed to have a given value for some metadata, and you have a different value. Well, because you can run viso with a working implementation. And this is very, very, very helpful. So let's say I, I'm working on the streamer, and I, I have the streamer implementation ready. And now I want to do a ffmpeg implementation. Well, if I'm starting from zero on ffmpeg, it's going to take me longer, much longer. But I don't have to start from zero, I can use viso to trace the streamer implementation that I know it's working because while I use it and I can see visually that it's decoding video fine, it's the results look good. I can use viso to trace it. And now I can use the trace that viso provides me and compare that with the trace that viso provides me for the ffmpeg implementation, compare that with the streamer implementation and say, Oh, these two things they don't match. So here's what I have to fix in order to get them to match between just remember and ffmpeg this is very helpful. This is not new, by the way, so other APIs like VA API, which is another video API that I work with. They also have similar mechanisms like VA trace and believe me, I've used like VA trace for a long time. If you don't have this kind of mechanism. It's much, much, much harder for you to write video codec software. Daniel, there is a question in the Q&A looks like there are two of them. I can read them to you if you can't see them. Would you, can you see them or do you want me to read them? No, I can see them but if you want to read them, that's fine. Okay. Okay. I think you API when used, why is all user space should also send the metadata correct. If yes, how if if you're if I'm on if I understood correctly the question is should we send the metadata to viso as well. This is this. Can you repeat the question. Yes. You API when used to viso user space should also send the metadata correct. If yes, how I don't, I don't know. Gautam, will you be able to rephrase this question. I don't think it's really, it's not clear to me. Also, Daniel, if it's clear to you, go ahead but otherwise we can ask Gautam to rephrase. I think I think I understand what the person is trying to ask if if we're using viso should we also send the metadata. The answer is yes, you should also send the metadata. Because again, viso is a stateless driver like any other driver it's just virtual doesn't drive real hard. But that's the only difference. So yes, you should also send metadata. And for the second part of this question, which is how how should you send the metadata. Well, like any other driver. Like, how do you send the metadata for a driver that's driving real hardware through the codec you API that we have spoken previously. So, I can just is actually a really good question. So the way that you send metadata through user space is there's some, there's, there's this thing in video for Linux called V for all to controls. I assume you know what an I octo is. There's I octos to to to basically set data into this into this controls. And then this is how you implement the codec you API, because we've spoken about the codec you API but we haven't said so far. How is this thing really implemented into into code. The other works is, you have a bunch of controls which is just a bunch of structs basically containing a bunch of data that you can feel. And then you feel this in user space, and then you you you should I an I octo from user space to tell the kernel. Hey, I want to. I want you to set the values that I have just sent you. And I want you to use these values to program the hardware. This is how you set the metadata in viso. This is how you set the metadata in any driver through through the referral to controls. So let me add a little bit to what you already said Daniel, if you if I may. This is a virtual driver if you think of it as a virtual driver. It is equivalent to any other mechanism you would just say that it is a driver and you assume that there is. It is simulating a behavior of a hardware very similar to QMU does on architecture, for example, you can run a QMU and it'll emulate an architecture. So what that does is this is the way this has been around the concept of simulation and emulation has been around for a very long time. If you were to say a an organization or a product development is developing hardware firmware and OS and user space are responsible for doing all of that. If they were to serialize all of these things developing hardware first and then firmware first and then go to running verifying the kernel it takes a very long time to get anything out. So the concept is that you would simulate how the hardware works so that you can verify your kernel way before you can have the physical hardware ready. So this is in similarly virtual drivers help us we have other virtual drivers as well. VimC is another virtual driver. Weisel is one. And we have few other drivers in the video Linux media. Linux media does a very good developers do a very good job of providing these for the user space so that they can continue to test them without dependent on hardware. That's how I view these things Daniel if you can comment on what I said if you like. Yes, so so we we also have other virtual drivers as you said in the in the media subsystem and one of the things that we want to be able and I'm also going to be talking about this in a few slides in the future is is to test without hardware. And this is one of the major use cases for Weisel for now. So we we want to we want to make sure that even though you don't have the hardware to because not all machines have hardware for all codecs. So in front of me I have like, you know, two or three different machines a few different boards with different SOCs, and they all differ in capabilities. So you can imagine that it's very possible in fact very likely that the machine that you're testing won't have support for all codecs. But nevertheless you want to be able to test other components like the user space stack that that you're using to drive it. Let's say that you're a company using just trimmer to drive your product, you want to make sure that this doesn't break, you can use the Weisel driver. I think Google is moving to use the drive the Weisel driver to test their implementation in Chromium. So so testing without hardware. Yes, it's it's one of the major use cases for Weisel. There are a couple of questions in Q&A. If you would like to get them now or later. There is a question from Prem Deep. As it as it is wiser based. Can we have multiple instances of same codec running in parallel by diff app and accept not much delay? If we can have different instances of the same driver, the same app. Yeah, yeah, I don't know. Actually, yes, but it doesn't this doesn't have to do with Weisel in particular, and it has more to do with the fact that a stateless driver is a clean slate. So what as we said the whole idea behind a clean slate is that you can program it. You set all the data you ask it to do to carry out a workload. It does it and it returns without storing without storing state within the device. So what this means is you can have, let's say two streams like two sessions with very different media playing. And it doesn't matter because as we said the device is a stateless so you program all the you program all the registers and everything to the code given stream. The hardware will carry out the workload and then you can immediately program it with another completely unrelated stream and it it'll work just as fine and this has to do with it being a stateless device. I have one more question. Yeah, okay, there is one more question how fast is the codec when used Weisel versus hardware. I'm thinking performance difference I mean hardware is going to be faster, very likely but yeah go ahead. Actually, Weisel is very fast because Weisel isn't doing anything. So, so like actual hardware is very fast as well but like, if you're submitting a 4k, if you're submitting a 4k stream, let's say, and you're asking hardware to decode it. It's not trivial right it's going to, well it's faster but it's still some workload that it has to carry out, whereas Weisel you're submitting, as we said in the previous slides you're submitting the 4k stream the same as you would with the physical driver with a real driver and the metadata. But Weisel will not touch the actual 4k content it's not going to touch the actual compressed data because it doesn't decode data. All it does it's dump it dumps out it dumps out the metadata as we spoken it dumps out debug information but it doesn't do the actual workload of actually decoding the data so it can be very fast. Okay, one last question in the chat. You mentioned this is useful when there is no hardware support for the available codecs, but these codecs are mostly established and supported with current desktop systems is or will there be a support for AV1. Okay, so let's display this question into two questions and it's this is a very good question by the way. So the first part of the question says that they're established and there is already support for the majority of them. Well, yes, but not really at the same time. So if you take like AV1, I wouldn't say support for AV1 is widespread. I, for instance, as I was saying I have like three machines in front of me, none of them have everyone support and they're new wish machines machines I bought like one two years ago. So if you have an Intel device, you have to have I think Gen 11 and Gen 12 to have everyone. If I have an NVIDIA GPU you have to have one of the 40x series or 4080s 4090s. So my point here being AV1 hardware support is not widespread at all. So, so there's a bit of a of a misunderstanding there I think, and, and even for older codecs, you can use what you just said for like older codecs. Well, HVC or H64 are already very established they've been around for like 10 years for HVC or 20 plus years for H264. Yes, but you may have some device that do not may not be able to decode it, you know it. It's not as common because they've been around for longer, but it can happen and if it does happen you want to be prepared. If you're testing, you want to be able to test all of them basically, and be prepared, no matter what hardware you actually have the capabilities of the actual hardware you have. And, and the second part of the question is about AV1 support. Will there be AV1 support? Yes. Yes. In fact, people have submitted a patch for AV1 support yesterday. So it's been in review. But to answer your question, yes, there will be AV1 support in Vizol as well. Thank you, Daniel. Hey, start your, resume your presentation. I don't know how much is left and you have about 32 minutes. So, go ahead. There's not much left. Let's continue. Why should we bother with a virtual driver like Vizol? I think I've already gone through most of the content here because they have to test your code, your user space code when you don't have the hardware, have to prototype. So I, when I, I wrote the AV1 stateless support in the Linux kernel, and one of the things I wrote Vizol for is I was using Vizol to prototype what worked and what didn't work. So it's easier when you're still in the, in the, in the development stage, when you have a virtual driver that you can tinker around with it and see what works and what doesn't work, what is ergonomic, what is an ergonomic. So it helps with this. I also said that you can have a working user space implementation and use that to develop a new implementation and use Vizol to help you out with this. This helps, this helps out tremendously. I can stress enough how much it's helpful to be able to trace the trace things with a driver to trace a working implementation with a driver when you're developing code for a new app. And how is Vizol different from a real driver? I think we've touched upon this as well. Real drivers, the actual difference between Vizol and a real driver. Well, first the obvious thing is that a real driver will use the metadata that you pass in, as we said, using the Codec QAPI, a real driver will use this to actually program a real device. So it will arrange with the kernel so that it will be able to talk to a real device through the DMA. And then it will use this, the metadata that you have sent in order to write actual device registers, an actual device memory to, so that you can tell the device of the device, you can send the metadata directly to the device so that the device can process with the workload. So a real driver will send the metadata, use the metadata to program the device. Vizol will not program any device because it doesn't have an underlying device. What Vizol will do instead is it'll use the metadata to program the Vifral2 Task Pattern Generator. And what is the Task Pattern Generator? It's a piece of code, a component written by Hans Rick Hul, who again is one of the Vifral2 maintainers. And what it does, it uses this to actually write a bunch of strings, a bunch of debug information directly into the Capture Buffers, basically. So if you run Vizol with the streamer, for instance, what you get when you inspect the frames is you get a bunch of color bars, which was written by the Vifral2 Task Pattern Generator. And you also get a bunch of strings, a bunch of text that was also written by the Vifral2 Task Pattern Generator. And this text contains a bunch of information about the stream that you can use to debug it, basically. And Vizol will also use ftrace to dump this information. And I think the main difference from Vizol to a real driver, as we've spoken at length here, is that Vizol doesn't decode any video. It uses the Task Pattern Generator to dump information to the Capture Buffers instead of decoding video itself. Another question that people usually ask, I have been asked this a couple of times, is how is Vizol different from V-Codec? And V-Codec to start off with is a different driver entirely. It's another driver. And unlike Vizol, V-I-Codec can actually encode and decode video using its own codec standard, which is fwht. So fwht can actually encode and decode video. It's a real codec standard, so to speak. Albeit, it's a very simple one. It was written, I think, when a person wrote it as part of his academic thesis, I think, if I'm not mistaken. But it's functional. It works. That's the point. It can actually encode and decode. But it's not used in the industry. It's not like HVC or H264 or AV1 that has widespread use in the industry. It's a more simple standard. And unlike Vizol, V-I-Codec also has stateful support, which Vizol again doesn't, because Vizol is very focused on testing the stateless, beautiful NX2 APIs. And I think, too, I'm starting to wrap up this presentation. One of the points I want to make before I'm done here is that if you understand Vizol for the newcomers here, if you understand Vizol, you understand how codec drivers work. Because the major parts are there, how you can interact with VB2, with the VideoBuff2 API, with the M2M API, how you can get the driver to probe, how you can access the values, the metadata and the controls, and how you can set up what we call a VFRL2 control handler. All these little aspects of how you write a real driver during Vizol. So for anybody who's out there who may be thinking, how do I write my first driver for real hardware? Where do I get started? I would say get started with Vizol. Everything you need is in there. And then obviously you have to figure out the specifics of your device that you're trying to program. You have to have some supporting material to tell you how to program that particular hardware. But the general workflow and the general idea, you can get it pretty easily from Vizol in my opinion. There are also examples of real codec drivers out there that you guys can check. I like RKVDec. That was the first codec driver I've ever came across when I was getting started with this line of work. The RKVDec driver is the driver that drives a bunch of Rockchip products. For instance, I have the RK3599 SoC standing on a board right here in front of me. And this is driven by RKVDec. We have Hentro, which is a video IP from Vare Silicon, which is present again in a number of SoCs out there. And you can see it in driver slash media platform Vare Silicon. Cedars, which is again a very cool project. It's a reverse engineer work from all winner SoCs. And it's also a staging driver. You can check it out at driver slash staging slash media slash sunshine, I think. So you can check these real drivers if you want to see how a driver for a real device. This actually operates and how it difference from Vizel. And to wrap it up, how do I run Vizel? I think it's important if you want to, you don't need any, as we've been saying over and over and over this presentation. Again, you don't need any hardware. You can run it on your laptop if you want. All you need is a new version of Gistrimer version 1.18 at minimum, which was released in 2020. You have to have that installed. And once you do, you mod probe Vizel. And then you have to run a Gistrimer pipeline. I don't know how many of you are acquainted with Gistrimer or what Gistrimer is. Gistrimer is a major software on Linux and other platforms. And what it does, it's a multimedia framework. Like, I don't know how many of you guys know, for instance, PipeWire or GNU radio, all these softwares, they operate, you know, very similarly from PipeWire. From a logic perspective with Gistrimer. And the idea is you have different elements and you connect these elements through pads. And once you connect, you can connect different elements to carry out a workload. And this is what they mean when they say it's a multimedia framework. You connect different elements. And then once you, one element will, for instance, as far as they stream, the other element can talk to VFR-02, for instance, and have VFR-02 decoded. Another element can talk to GRM or KMS and have that displayed on your display and so on and so forth. And a simple way to do it without having too much code, without having any code at all actually, only from the command line, from the terminal. Is through a tool called GSTLaunch 1.0, which is part of Gistrimer. And what GSTLaunch 1.0 does, it takes a string and this string, you're going to describe your pipeline and connect elements together. So in this example pipeline that I have here, we have three different elements. Not really, well, I'd say we have four, but not really four. I'm going to explain why. So the first thing is file source. File source will basically read some file in your disk and you can control what file it's going to read by this location property. So you're going to point it at some path in your file system. It's going to read a file. The exclamation mark connects one element to the next element. So here I'm connecting file source with parsebin and parsebin is a collection of elements. It doesn't really matter for this presentation. Just think of parsebin as you're telling Gistrimer, hey, figure out how many elements I need automatically in order to parse whatever file source is giving you. So parsebin will automatically expand into the right elements that you need in order to parse the video data you're giving it through file source. And then you're connecting parsebin to AV402 decoder element. In this particular case, this element will decode H264 video data and this element will actually talk to the kernel, set up the driver and then this element will coordinate with the driver to get a decode loop going as we've seen in the previous slides. And this element will take care of getting the buffer, getting the memory, sending this memory to the kernel with the compressed data, sending the metadata to the kernel, getting the decoded frame back, getting into a nice tight format that Gistrimer understands and then passing this along to another element. And this other element is the last element in this pipeline is called file sync. And file sync will get this metadata, I'm sorry, will get this data, this decoded data, and it will, all it's going to do is it's going to store it for you in some path that you tell it through this location property. So if you mod probe vizel and you run this pipeline, it's going to work, just as if you had any other driver, if I had RKVDec, it was going to be the same pipeline. If I had RKVDec, if I had CJAS, any other driver, if I run the same pipeline with them or with Vizel, it's going to work just the same as far as Gistrimer is concerned. So what should I expect while Gistrimer will start playing your file? Once you hit enter with that pipeline, Gistrimer will start playing your file for you. And the file sync element, as I said, will write the decoded quote unquote data into a file. And then you can inspect what is in there through any program you want. And one example that I use the most that people over here at Collaborate use the most is a piece of software called YUView. So with YUView, it will tell it the resolution and it's going to interpret that YUV data for you. And then you can see the actual decoded frames. You can see what Vizel has written through the Vifro to test pattern generator. You can see all the debug data that has been written in the frame through YUView. And you can also use ftrace if you want. So Candace, I think I sent you a link at the beginning of the presentation. Can you open it please? Yes, let me take over sharing the screen so I can show that. While Candace is doing that, there is a question here. Why was it stateless? If you would like to take this question now or Daniel, first of all. That's fine, yes. Why was stateless hardware created in the first place? What advantage does it offer for hardware manufacturers or user space as compared to stateful hardware? Well, for manufacturers for vendor is it's simpler, right? You get to have simpler hardware basically. Because if you remember, if you have a stateful device, you have to have more stuff. You have to have a way to parse the bitstream in hardware and extract all that data yourself and keep track of that data yourself. And when I say yourself, I mean the hardware has to be able to do so internally. So it's more complex, right? And if you have a stateless device for a vendor, it's simpler. But there's no magic here, right? The work that the vendor isn't doing by having a stateless device being simpler and everything somebody has to do. And this somebody is the user space programmer. So the responsibilities that are handled automatically by hardware in a stateful device has to be taken care by the programmer in a stateless device. So you have to have more user space code to drive it basically. But you may think, whoa, so it's more complicated for the programmer, for the developer. Yes, but it's not all that bad for the following reason. The programmer, and we've said this previously, you tackle a stateful device as a black box, I said that. And turns out that we programmers do not like black boxes, right? Because if something's broken, if you're not sure where you have a mistake, or if you have broken hardware, then there's not much that you can do. And it's a black box, you don't have much control. Once you submit the data, you don't have much control over what's going on. Versus when you have something in software, if it's broken, okay, you're just in a patch, you patch that, people take that upstream, and you have just solved your problem most of the time. So as programmers, I must say we also like stateless devices. It takes more code, it's more complicated, but you're more in control of what's going on. There is one other question. Also given that user-based software now has to parse the bitstream, doesn't it mean that user-based drivers have to now pay royalties for patented codecs like H264, H265? Whereas previously, because they just blindly sent in a bitstream, they wouldn't have to pay royalties. This is a very good question. I actually have a presentation that's just about everyone and why we need everyone basically, and it has to do with patents and basically what this person has just asked. So what we are seeing in recent times is people are basically telling that, for instance, Fedora, if I'm not mistaken, has basically disabled support for VA API, I think, by default under distribution. So it's a kind of hot potato sort of situation. So they're saying that if you enable, for instance, if you're decoding H264 or HEVC, maybe you're supposed to be paying for royalties and if you're not complying, it's on you basically. So people are basically pushing the problem around. So for Fedora, I think it doesn't come enabled VA API for support for a bunch of codecs doesn't come enabled by default for precisely this reason. But nowadays, I am not really aware of royalty payments being made on behalf of FOSS software. We have a tendency to, in the shimmer, for instance, we have a tendency to separate these things into, to contain these things into different modules. So if you have like the code to decode H264 and HVC and so on, it's contained into a module called GST plugins bad, which means that, again, you may have some issues with royalties and so on. But I'm not aware of payments being made on behalf of free and open source software thus far. Okay. I think that's all we have in the chat and the Q&A right now. Thanks, Daniel. So you can continue. Actually, I'm waiting for Candice actually. Oh, okay. I think she has that up right now. Yeah. I was looking at my own presentation. It was never going to show up. Candice, can you can you open the second link? Actually, the other link? Yep, this one. Yes, this one. So I what I want to show here, can you scroll down a little bit? Yep. A tad more? Yes, this is good. So, so what I was going to say is on top of, on top of using Vizel to to dump the data into the frames themselves through the default to test pattern generator. The other thing you can do is use F trace. So Vizel will have a different number of trace events and through these through this trace events using F trace, you can dump the metadata. So what we have here in the, what we have here being shown is the metadata for some HV, some HVC file that I was, I was playing. You can see how you can tell Vizel what specific piece of metadata you want to display in the line above. So you say equal one to debug slash tracing slash events slash Vizel HVC controls indicating that you want to dump while HVC controls. Then specifically you want to dump the SPS values for for this particular HVC stream through default to control HVC SPS enable. And then Vizel, once you, once you cat the trace buffer, what you're going to see is the values for the different fields that the user space application has submitted to Vizel. And this is also a way you can use to to debug so you can see the values that it has submitted for instance for the video parameter set ID, the sequence parameter set ID, you can see the resolution and basically all the values that you have submitted to the driver will show up in the in the trace buffer. And this is this is enough this also know can you scroll down a little bit more. Another thing that's very helpful in my opinion that you can do is you can trace the the the contents of the output of the output buffers using Vizel. And why is this important because again another mistake that you can make is that you can think that the compressed data starts at a given offset, but it truly doesn't. Because again the your parser may have a bug somewhere in a tiny bug. And now you have computed the offset to the to the actual to the start of the data, the compressed data you have computed the wrong offset for instance. And if you do this what's going to happen is that you're going to pass either incomplete data or you're going to pass data that's not really compressed data and that the decoder will attempt to decode that and it's going to crash at some point. So one way that you can use Vizel to help you debug is by using by using by dumping through debug fs and seeing whether you're passing the right data by comparing with a working implementation. This is what I wanted to show your candidates. Thank you. No problem. Did you want to start sharing your screen again. Yes, please. Okay, I'm going to go ahead and stop. And then you can pull up yours. Yes. Are you guys seeing the presentation again. Yes. So again don't forget to play with the different options when loading the module you can see that the different options in the documentation you have. You can control the amount of information that's that's dumped. You can control a particular frame at which you want to start to dump and a particular frame at which you want to stop the dumping process you can control that through through options. And we're almost done here and the last thing I want to say is why should you care. And the reason you should care is because multimedia is ubiquitous basically it's, it's one of the things we use computers the most for us, at least as consumer, as far as consumer devices go. So there's a there is a research that was done by Cisco, and it predicted that by 2022 this thing was this this this was carried out, I think three four years ago. And it predicted that by 2022 82% of all consumer internet traffic would be video data, especially with the rise of 4k and HK. So video is one of the things that we use computers, the most as end users, and really having a good story. In the multimedia stack for an operating system is fundamental being able to support more hardware to to provide a flawlessly environment to to to encode and decode data. It's very important for it for an OS to have. So improving the multimedia stack and Linux makes the entire operating system more appealing as a whole it's, it's like gaming. It's it's one of the things that people care about people want to see that supported basically. And in my opinion it's a very challenging and an interesting line of work. I like debugging it. It's, it's like, I think that the problems in this domain are very interesting. It's, it can be a very fulfilling career path. And, and the last thing I want to say is that if anyone here is interested in following along this career path, the referral to community can really use more members more contributors, especially contributors that can one day evolve or grow into being maintainers and helping out with code reviews and and and helping out the community that the referral to community is really in need of these people so there's a space for for more people in this area for for Linux and for referral to And this concludes what I had to say I hope this was informative for the audience here. Thank you so much. Thank you Daniel and plus one on one Daniel said before L2 community is very welcoming. And the work is very challenging because of the devices they are complex devices. And one each device might could have multiple drivers associated with it. Some of the problems that I solved in that space. They have been the most fulfilling ones for me, like Daniel mentioned send some of the mentors, the developers there they are eager to help new developers. They are always supporting. And so yes, plus one and everything Daniel said, and as for gaming if we ever want it Linux to be successful as a gaming platform. Linux media is it you really need Linux media and before L and video support to be able to succeed in that space. So, thank you for doing this webinar Daniel really appreciate it. Thank you for having me here. Thank you both Daniel and show up for your time today and thank you everyone for joining us as a reminder this recording will be on the Linux Foundation's YouTube page later today. And a copy of the presentation slides will be available on the Linux Foundation website. We hope you are able to join us for future mentorship sessions. Have a wonderful day.