 All right, so So I'm Adam Yooker. I am an fmpeg developer I've done a lot of work on audio codecs and video codecs I was crazy enough to write a dollar decoder a year ago And I wrote the first non-live opus based encoder just and I released it just like a week ago So what I'm going to be talking about today is AV1 now for those of you who don't know what AV1 is I've prepared a slide for that so AV1 is supposed to be Interoperable and open so it's going it's always going to be royalty-free and it's going to be optimized for the internet So that so that you know many of the people who are in this room can you know directly benefit from that? You know the increase in video compression the royalty-free and all the other stuff It's going to be scalable to to any modern device that any bandwidth Mostly it's going to be divided optimized for for the hardware so that you can easily to call that with little power on the on a hardware decoder and as well encoded with varying degrees of success And you know all the other stuff which which you see all the time in marketing This is this by the way, I stole from the AOM website So that I would never write something like that. So the code it this would mean something like that These these are the real bullet points. So it's going to be royalty-free. It's going to be it's it is already You know in development and it's already open And lots of companies are participating in in the development of this codec. So so all the major players like Amazon Netflix Google YouTube Mozilla and so on and So whatever happens the codec will see an option but So what's important now is to get the codec to be competitive with with other codecs that are being developed right now and Also make it an upgrade to any codecs which are currently being used So now unfortunately not not all You know companies have joined so there there is still some intellectual property are there which which we cannot use So that means that we have to get clever and we have to go around that in some ways and The whole process of going around means rediscovering New ways of doing something which might or might not be more efficient so moving on The reference encoder is based on VP9 without VP8 support But with bug fixes. So so some of those big bug fixes were meant to go into VP9 but didn't so they they made it into VP10 But VP10 got turned into into AV1 and so so we carry on So the way that that the development works is companies contribute experiments and these experiments make it into get master of the encoder and After being integrated The the experiments are supposed to go through intellectual property review So there's a team of lawyers who looks at at the description of the encoding tool and and tries to find patents which have been updated or which haven't been updated which much the description of what the tool does so and Eventually after the bit stream has been frozen which should happen around Q4 of this year All the experiment should be removed This is what's in my imagination what's meant to happen, but it hasn't happened yet So about the experiments there are currently 50 experiments and granted some of them might Don't sound appealing or down or don't sound like they contribute anything like emulated hardware or The big bit stream, but the point is that most of these experiments increase encoding efficiency and Well, there's just 50 of them and they're being constantly developed and updated So there's a whole lot of development going on right now So I cannot really cover every single of the of the 50 tools of the theory of the Experiments which are currently in the codec and not to mention that every month new experiments get added and some experiments get enabled by default and So it's it's kind of difficult to keep track of But I'll just go through some of the coding tools which which have been demonstrated to give gains and have You know our good candidates for being You know part of the Codec and the specifications later on so the first tool which I'll cover is directional d-ringing now directional d-ringing was was something which was supported from Dala and it was required in Dala because Dala used overlap and well most of the encoding techniques in Dala Contributed to some degree of of ringing in the image. So this the ringing filter was developed for Dala But it turns out that since it's arrived at the very end of the encoding process Of the decoding process you can just easily paste that into any codec and it would just work So the way it works is you first segment the dimension to 8 by 8 blocks and Then you scan for a direction inside this 8 by 8 block. So so you do that by computing the the The least squares method I think so the direction doesn't really matter what matters is that it does a conditional replacement filter so instead of blurring out any artifacts and by extension any Detail in the image it only acts on on very obvious No ringing patterns So in case a single pixel deviates by some amount which varies as a function of how far away you're particularly from From the direction vector It will replace it with with some kind of an average So it works really well and it gives I think around two or three percent improvements It's also easily simdiable and it has currently been enabled by default in AV1 So and so This still is probably going to make it into the final version of the codec Another tool is pvq now pvq is going to give by far the most gains, but it's by far the most difficult tool to integrate and What you can think of pvq is you can think of it as a black box Where you can insert any kind of coefficients in the frequency domain and pvq will predict from both the current The current image and whatever you give it in to predict from so So it can be previous pixel values previous coefficient values, or it can be for instance coefficients from luma and you want to predict chroma so So the way it works is well, I'm not really going to explain it But you can see from the diagram that if you imagine the coefficients inside a block as a vector You can describe that vector as pointing to to to the surface of a sphere of an In-dimensional sphere and if you have another vector, which is what you're wanting to predict from you can do just a house order reflection and then you just send a A t-tangle, which is which is n minus one values long so I just wanted to spend some time on on discussing pvq search because I think pvq search is a Well, it's an important problem which which also needs to be to be You know solved and made faster We currently have an implementation which is carried on from opus because opus used a pvq search as well We do rdo on the pvq search. So that's a bit different, but the root of the problem is is is Very simple. So if you have a vector and and you want to quantize it Using a pvq search You just normalize the vector to an l2 which is which is the euclidean norm Which is basically you sum up the squares of each component Then you'll do a square root and then you divide each component of the vector by the square root and and well You do that for for the output Vector quantized as well. So it's a simple problem But but it gives great results and and I think it's the way to do vector quantization so so If we if some of you wants to to go ahead and give it a go and give it a try and improving it You will improve all performance not just in dala not just in opus, but in everyone as well. So that will be useful but The properties of pvq is also what makes it very interesting Using pvq. We can vary the the cold book so we can optimize it for instance for areas which have low contrast So so this is what activity masking tries to do It it attempts to provide better resolution at in low cost Contrast images and this is uh, this is in contrast with With hvcu which attempts to to aggressively remove details from the original image And if any of you has has done any encoding using hvc, they will know that sao Basically is a tool which you Which you turn off if you see any bad results immediately So activity masking is is something which is difficult to to kind of make it into AV1 since it requires a different distortion function But we're all actively working on trying to make it into Into something usable in AV1 So another tool which which Which is also supported from dala is chroma from luma and since we do pvq entirely in the frequency domain This means that we can also inject values from chroma from luma coefficients and More accurately described. What exactly are the details in the in the chroma? So you might ask well, how does it work when you have a sub sampled chroma? Well, we use tf switching to throw away any details Remember, this is all in the frequency domain. So you cannot do You know rescaling and And you know conversion in In the spatial domain since we're all in the frequency domain So the resulting coefficients will be used in a dbt transform and That's one of the difficulties in implementing chroma from luma in AV1 since AV1 has currently two different transforms AV1 has dcts, which are the The standard way of doing Of doing transforms and there is an ad st which which allows you to to To in some circumstances get better results But there are more transforms being planned to be added into AV1 So chroma from luma is kind of a difficulty in to implement But the results really show that there's a big improvement in chroma detail and in luma detail as well since In order to better describe chroma you need better luma. So that's why tf So that's why cfl is Is looking like a nice feature to to implement into the final version of the codec Now a note is that it works for 444 and it works for 420 Formats, but it doesn't work for 4202 So if you want to use 422, please don't just just use 444 or 420 or Or just increase the bit rate, you know, you can always just increase the bit rate like the old 80 megabits impact two streams and pad the bit stream But in order to pad the bit stream, there's a convenience segue here You need to ensure that the codec that the ray control system will not overshoot So it will not overshoot grossly since if you plan to pad then you obviously want to pad to some to some bit rate But you don't want to overshoot of course. So So this is what i'm currently working on I'm trying to fix the ray control system in AV1 by just scrapping it and by inserting the ray control system from DALA, which was also the ray control system from tioro, but was ported to DALA so So the ray control system which which i'm working on the tries to Basically predict the amount of bits that That the current frame will use and it does that by just this simple model and the scale value is It's what's modified from frame to frame so the codec will first Give use the scale value from the previous frame so and will predict the bits the current frame will take and will then After the frames being encoded it will measure the real scale value Since you know how many bits you've used and you and you know the quantizer and alpha is just some Is just an exponentiate value, which is which is different for all frame types And it will correctly And it will try to smoothly transition scale from one frame to the other frame using a second order Bessel filter Which will also throw away any any extremes in the In the final quantization values being used And as such it will it will give a smoother, you know, visual experiment experience of of of encoding So it won't do any gross overshoots that the current rate control system does and just to share something with you I have seen on youtube a 200 megabits sustained for five or so seconds You know On a on a screen directly from youtube granted it was vp9, but it was 4k at 30 frames But still nothing warrants 200 megabits of you know of continuous usage for a few seconds so So and it will and this rate control system also will support a A way of of easily providing chunks To to kind of encode so instead of instead of you know, encoding separate 10 frame chunks I mean a few seconds worth of video, you know of chunks You can just signal a reset in the in the two pass mode of the Of the rate control system And it will reset all the statistics and you can just continue on encoding as if as if you've just started encoding a new frame and yeah, yeah, all right and Another feature which I'd also like to talk about Shortly is rns and rns is being developed by google currently and rns will offer big improvements in the coding speed, but But unfortunately there are some drawbacks It needs a big enough buffer to store all the symbols in the encoder before reversing them and and Writing that as a bit stream because it works kind of as a stack so there are some hardware manufacturers which Which aren't content with with having a huge buffer, but But at the end it's either this or the dollar entropy coding system Both of which have the same efficiency, but rns has the advantage of of having a higher recording speeds and And there are also some experiments which which which I cannot really dedicate the entire Times talk about but there's ext etx which will give more transforms Which as I already mentioned will kind of be a bit of a problem for cfl, but I'm sure we can make do it With it somehow There's also a an adaptive coding order. So the old zigzag which which isn't paid in anymore It turns out that the old zigzag may not Result in the best encoding efficiency So if you have some kind of other patterns you can you can use and you can do rdo to figure out Which pattern is the best then yeah, we if the experiment turns out to give improvements We'll also make it in and also there's 64 by 64 transforms which Which aren't quite certain that that will make it in but But will provide big gains on any kind of large uniform Images So with that I'd like to end the presentation and if there are any questions I'd be happy to ask them yeah, that's Yeah, that's the million dollar question and Yeah, yeah, so the question was are we going to support interlaced and Well, I'm not sure No comment I cannot say no, I mean You know how it is it's not designed by committee, but it is still designed by some people with various interests So whether it will not make it or make it I cannot say but you know, many of us have strong opinions about it not making it very strong Yeah, so Yeah, well, uh, the question was uh, wasn't the bit stream going to be frozen in march of this year And well the bit stream was going to be frozen Last year and it was going to be frozen Many, you know times before that but you know how it is, you know, you just have to keep You know extending because Some features you really don't you need some more research to to to improve some features But what's important is that we don't want to delay it too much because by that time H266 will will be out and and we'll have to compete with that and we want to get Adopted before that so that less people will adapt the less royalty free equivalents of you know video compression products So anyone else? Yeah Well, you should look into into Yeah, so the question was Have we looked into storing motion information as As a kind of a 3d type of type of thing where the third dimension is time I presume, right? Well, there there was an experiment that ziff tried to make it was a codec I don't remember its name, but it used wavelets in three dimensions to try to compress video and and As far as I remember you could actually see, you know Information Images ahead of time, you know before they happen as kind of a ghostly and you know images But apart from that nothing's being looked into it because it's just so radically different and Sounds and is difficult to to kind of do enough research to get something which is Which is actually implementable and gives improved it gives encoding gains and doesn't require, you know much hardware You know to actually implement so so I don't think that it's the way to do it And I don't think it's the way to do it for the next 50 years or something All right, there's there's There's a personality in back Okay So the question was are we going to work on mp4 encapsulation, right? Well Let's first get a codec which is which is presentable and and then we'll work on mp4, you know Muxing because as it is right now all of us are working on improving the codec, but but you know I'm not sure why you want to use mp4, you know because mp4 is what 20 years old now If it ain't broke, yeah Well regardless of whether it's in mp4 It will mostly be used on the internet and right now webm is the de facto standard for For video and audio coding on the internet, so it will first be implemented in webm. So Well, sorry about that any any other yeah Yeah So the question is are there plans for non-tile moody trading? Well There will always be frame trading, but there will always be Be Well in the decoder, but there will always be tile trading as well. So right now there are no plans to drop tau tau You know trading decoding But I think there were some plans to drop frame parallel Decoding so there's that you will you have to ask thomas didi on monarcy All right All right one more question someone All right, yeah Right. So the question is how do you decide which which feature which coding tool is implementable in hardware? And which one to actually implement it into the Into the codec and the answer is that That's during review during initial review You'll get some feedback on whether the tool you're trying to encode is feasible And finally, you know after ip review Or around that time After it's it's in get master The hardware companies which are part of the alliance will will go over the coding tool and will try to see if If it's implementable using using current, you know, hardware Decoding production means And if they give it a go then then, you know, they give it a go and it goes in All right last question How is the compression ratio? Well, right now we're doing quite a lot better than ht64 We're doing better than ht65 on basically all metrics And after the encoding tools which which i've mentioned are implemented We'll do slightly worse. Maybe psnr base, but psnr is a horrible metric. So So perceptually we will look, you know, quite a lot better than anything which is currently out there All right, uh, well, that's it. So, uh, thank you for having me I guess