 All right. Hi. Well, my name is Rostislav. I go by the alias Adam Newker. I work for Mozilla on video compression and I am an FFMPEG developer. What are we talking about today is is something which is kind of unusual for me because I usually go for technical talks. I confuse everyone in the audience. They hate me and no one wants me to come back. What I'll be talking about this time is is more of a history on what has happened with with codecs which have tried to to push the boundaries of what is possible with video compression and how a video codec is developed and how old codecs might also be of some use. So the title of this talk or the whole talk might have been a bit influenced by a post which was recently posted by the MPEG head of the chairman of MPEG and he said how the development model of AV1 is killing innovation and it's killing research done on video codecs. And I'm just here to tell, well, that is completely wrong. And I'll start by doing the exact opposite of proving he's wrong by saying well, how did we get to this point and what codecs did not help us to get to this point? So we're at this point now where we have a standard set of coding tools which are used by pretty much all video compression codecs everywhere. And so we have a motion vector search. You'll get a Q index for the frame which is part of the break control system so it's not really needed to get this right. You do an audio search for each block. This is the scalar quantization, loop filter, some entropy coding and most of these tools were present in, well, all of them were actually present in MPEG codecs since the 90s. So you might say, well, MPEG has completely changed revolutionized video coding and that's partly true. They've shown us a way that we can do things which is guaranteed to pretty much work. So if you implement this, you can just badge a new codec and it will work just as well as any other MPEG codec. But I'm here to talk about the codecs which came from left field, so to speak. One of these codecs was Jirak. Jirak was an attempt at making an X-generation codec by the BBC. They were planning to use this in their broadcast networks. They were planning to have hardware support. They had vendors lined up. They had all sorts of advanced coding techniques but it was all let down by the very thing it tried to implement which was wavelets. See, wavelets came more or less from around the 90s and people were convinced that they were the future of video encoding. What they didn't realize was that wavelets were impractical in the way that wavelets were not very practical for video encoding instead of image coding. They worked very well for image coding but they didn't work well for video coding where you constantly have to copy parts of an image to another part of an image and wavelets didn't really perfect that. Another codec which tried to revolutionize image coding this time was JPEG 2000. It's newer than JPEG so we would expect this to be far better than JPEG but it isn't. It uses all kinds of new techniques to try to compress images better but it doesn't do a much better job than JPEG and it's such a shame that nowadays it's being used in the cinema industry and VLC are complaining that they cannot decode DCPs fast enough. Another codec which also tried to revolutionize video encoding was SVQ 3. The reason why they failed was their business model. They didn't publish anything and people had to reverse engineer the codec from their tools and if they have to reverse engineer it then they're probably not going to border writing and encoders so the codec itself won't develop in the open source scheme and it won't develop in the commercial scheme as well because there's no support for it really. Another one was DALA which tried to do things completely differently in order to evade software patents and at the end it caught up to HBC and was considered to actually surpass it in terms of image and video quality. Unfortunately, it was well, deprioritized in order to work on AV1 which was kind of a success actually because the outcome was that two major coding tools were reported to AV1 and one was rewritten. Finally, you might think that I'm biased and that MPEG itself had no negative codecs and were always perfect in everything they did. Well, that's not always true but in MPEG 4 they completely messed up. It was an insanely complicated and massive codec which required people to actually write a good software implementation in order to actually popularize it and that is part of the reason why it became popular because of encoders like Xvid and Dvex but it brought something to the table as well, global motion. So all of these codecs didn't realize they didn't have potential or they didn't have the wrong business model or were deprioritized and so on but what does it take to develop a new codec? Well, you need to balance... Did I miss a slide? No, I didn't. You need to balance all of these three things so you need to satisfy hardware complexity you need to satisfy software complexity and you need to satisfy compression as well and finally, you also need to satisfy the actual licensing as well as any patents as well and if you don't do that then you suffer the fate of the previous codecs which I talked about. So, well, the two are inherently incompatible so what is easy in software is kind of difficult in hardware usually and what is easy in hardware so stuff like conditions and branches are a pain to work for in software because hardware has no predictors and misbranches and things like that which reduce performance quite a lot and if you reduce performance you generally increase complexity as well because you have to either try to make it simpler or you just leave it as is and suffer the penalty but eventually from the components which I listed in my first slide which are the essential parts of writing which are the standard way you write a video compression codec nowadays one of them or most of them are going to bottleneck so you can constantly improve it you can reduce software complexity you can increase compression but at the end one of these tools is going to give up it's going to bottleneck you in terms of compression so you have to replace it and where exactly do you get ideas from where to replace it from well, you pick them up from old codecs so like I said, Dalek contributed some tools to AV1 MPEG4 had a little bit of contribution as well B-frames didn't make an appearance because they were unneeded because you could do more flexible things with the way that VP9 had its invisible frames so there was no need for B-frame support so at the end you need to know where to change things from you need research and since none of them want to have compromises on compression or on any other coding tools then really you need to pick ideas either from research papers or from old codecs so if you want to see what's going to be an AV2 or AV1 plus 9000 or something you need to look into what older codecs did and what research has been going on because while the codecs themselves didn't make it in terms of popularity they did leave research behind which might still be developed, you never know wavelets are still being researched and they have found a bit of application in terms of error resilience because wavelets are very resilient errors AV1 does not include wavelets but who knows maybe AV2 will so I finish a bit early and you might think that I should just end it but I don't because I have a small encore see I've been working on Vulkan recently Vulkan if you don't know is a new API for accessing the GPU and what it allows you to do is completely control the GPU in order to have perfect in order to know what it will do predictably and in order to have consistent performance on all vendors and drivers so the reason why Vulkan has never been used the reason why other APIs like Direct 3D and OpenGL have never been used in video encoding was that they were quite a lot limited in what they allowed you to do and they were extended infinite number of times like with OpenGL's frame buffers which kind of allowed you to render off-screen but Vulkan completely re-does the way that that you access the GPU and it allows you to do interior things which were impossible to do with conventional APIs so one of the issues with using Vulkan or GPU to do video encoding is getting the data there in the first place so with OpenGL you had to create a texture and then you had to upload it in some way and this was CPU bound so even though you could have a hybrid kind of encoder you still had to have a dedicated thread to upload images to the GPU and download information back and having another thread usually it scares people who work with OpenGL because OpenGL has a global portrait state which is hard to work around with thankfully Vulkan gives you many, many ways in which you could upload data to the GPU in the first place one of the ways is the traditional way of doing it so you have a linear image which is lying around on your host ram what you could do is you could create a Vulkan linear image you could map the memory which packs that image and you could memory copy your image there you don't need any global thread states you don't need to mess around with anything you just map it, upload it, it's there you still use the CPU however so you still need a separate thread another way is the traditional way of doing it because using a linear image which is to say that the data is laid out as it is on your host ram and GPUs hate that because of cache currents is you could have a device local optimally tiled image you create a host visible linear buffer you map it, you upload it and then you use the command queues to actually copy copies what you could do instead is what Vulkan allows you to do finally using an extension still is you could create a host visible image like exactly like one but instead of using this image to do your calculation zone you could instead copy it via an asynchronous queue to another image which is optimally tiled and the best part is you can actually import file descriptors or CPU host memory and then the driver or depending on how well it's implemented can do the copying for you so you don't have to have a separate thread in order to copy things to the GPU you could just send it right away it will create a temporary image but in theory this will be much faster than anything has allowed so this Vulkan allows us to solve one issue another issue that comes with using the GPU 22 encoding is that it's not very practical in terms of what it offers so video encoding is usually very sequential you have a bunch of coefficients you need to quantize them, you need to encode them and you have a very large amount of predictions so each coefficient depends on the previous one and each bit inside the bit stream depends on the previous one so things like doing entropy coding on the GPU is simply practical and infeasible with any API or with any GPU but what you could do instead is you could have an algorithm which does things like searching for block sizes and psychovisual weights on each bit of the image so you could export that to the GPU and since you do the uploads for free if you have a look ahead which is sufficiently large so a few frames then there's no problem of actually just exporting this whole process to the GPU so you could have CPU which will instead do things like search more block sizes and motion vectors and quantization indices for each block so what you could also do is you could search for motion vectors this will also save you a large amount of time if you were to do this on the GPU as well there's quite a lot of research that went into using the GPU for motion vector search there's quite a lot of algorithms there's a lot of research papers but unfortunately there's very little actual code and most of the research involves CUDA which is the standard way you use the GPU for anything right now and it's vendor locked and it's horrible and you need proprietary drivers to actually use it so it's a no-go for most people but thankfully Volcan allows you to have a generic path for all GPUs and for all real operating systems not this one, not on the Mac because it's not available on Mac it's only available on Linux and anything which will implement it Mac doesn't because they want to have their own metal thing which if you have looked at it it's exactly like Volcan in absolutely every way they even call things the same way as Volcan does so you could potentially just do a string substitution and end up with something working in metal but that would mean actually having to template every single bit of code you make so it's unclear why they haven't allowed it but thankfully there is an open source alternative which allows you to interrupt metal with a Volcan API so this solves the second issue you could still use the GPU for something for video encoding now there's a third issue when using the GPU to encode and it has to do with memory management you see allocating memory on the GPU is kind of expensive because you have to go to the driver and then the driver has to do all the management has to find memory, has to set it to be yours and let's not forget that GPUs are quite a lot like just CPUs with quite a lot of threads so it still has a memory management unit which maps physical addresses to virtual addresses and back and forth so one of the things you could do with Volcan is it allows you to lazily allocate memory on the GPU which is to say that you don't have to allocate all of your memory at startup instead you could let the GPU handle things like temporary buffers or temporary images for in-between pipeline image transfer or information transfer where you don't waste memory unless it's actually needed so lazy images allow you to also completely eliminate the need to do any kind of image copying or buffer copying as well because the GPU or the driver itself is usually smart enough to know when it's safe to just reuse that bit of memory without actually having to copy it so potentially lazy allocated memory could save your memory and time actually doing a GPU's GPU mem copy if the compiler is smart enough and if you manage your buffers well enough another issue is that if you don't reuse images so for instance if you're the coder running coder can support any number of pixel format changes or width changes or height changes then you may end up in a situation where you constantly destroy your buffer pool and then you reallocate it with this new pixel format but thankfully using Volcan and its new multiplane extension you could allocate many multiplane images at once and the driver should internally pool with resources and decide which one needs allocation by checking which one of the previous images it can reuse and finally the final issue with using the GPU to do encoding is that you need to get the data back in somehow and thankfully Volcan like everything it does allows you to do many ways to actually get the information back so you could put it in a host visible buffer, map it you could even export memory via a file descriptor and then seek into it and read it like a standard file descriptor and the driver will do the copying or any management for you if it has to or it could just do it in place so this is my work in progress tree right now it doesn't use the CPU less uploading but it's all work in progress it does do arbitrary filtering and it is able to import DRM frames it cannot do anything useful with them because of an underside with the spec but it's all there, you can test it and hopefully it will get merged in the fmp3 soon so you would give it a test you could run arbitrary shaders and so on so any questions? sorry? the hack the hack? oh did I have to hack anything? no actually there's an external handle you can use to import memory in Volcan and this happens within your extension which is just posted and it does work however the image you get is styled so if you try to read it like a tile image and copy it to another image it will end up scrambled because the driver doesn't yet support detailing using DRM images as a source but I've contacted meso developers and they're planning to put a new extension up which will allow you to do that there's also another extension on the development which will allow you to also use the same part for DRM importing to do importing of OpenCL sparse I know and VA API images so you could potentially do processing right on VA API decoded images without any overhead to the CPU great team another question I think somewhere there no? well it makes sense because I went quite a lot of off topic because I found out at a later stage that my original topic was a bit too small to actually fit in 25 minutes but thankfully this is better, much better yeah it is, isn't it? last chance for a question since ffmpeg involves much more than mpeg do you think of renaming the project at some point? no thankfully because look it's now going to be cool once mpeg goes down so it's going to be cool in a retro sort of way thank you memory off sorry? memory off memory off oh last question perhaps yes? you mentioned CUDA but proposed using Vulkan instead yep but as I understand it OpenCL is the open equivalent to CUDA isn't it? so why are you using Vulkan instead of OpenCL? well because Vulkan is newer no actually because Vulkan allows you to do all things which you aren't allowed to do with OpenCL granted OpenCL is more comfortable to work with than Vulkan's pervy format which means you need a separate compiler in order to actually compile Vulkan kernels into pervy and then upload that somehow using the API to the GPO but Vulkan does allow you to do more things with the GPO than OpenCL and it's newer which is better okay thank you Atom Nuka alright thanks