 Ja, willkommen zurück hier auf dem FemmChannel. Unser nächster Speaker ist Janik. All right, welcome back to the FemmChannel. Our next speaker is Janik. Janik studied media technology and looked into live streaming on the internet and algorithms of video compression. And in this talk, Janik wants to tell us the basics of modern compression algorithms. For this, please open the language selection tool in the web player and select the translated stream. Und nun, viel Spaß bei diesem Talk. And now enjoy the talk. All right, and welcome to my presentation, basics of video compression. It's just supposed to be about the basics of any video compression, which are in basically any standard. And quickly about myself, I'm Janik studying media technology at the technical university of Cologne. And I'm developer at OWNcast, which is an alternative to Twitch, which you can put on your own server. And on the issues you can find me under these links that you can see there. All right, what is this presentation about? Generally the compression of video. So the basic idea is, so we're talking about the basic ideas without any particular implementation. And I'm going to orient myself basically on the MPEG standard, so H266 and VEC. And there's also going to be based on the other standard 265, but there's going to be an outlet on the 266. But it's not, it's just a quick overview. It's of course not going to replace a six month lecture. So, let's start with a quick lexicon. So, this is a pixel. A pixel usually has several channels. Those could be green blee. There could be an alpha channel there. Sometimes they're called differently. There's also an image or a frame. And a frame has multiple pixels, which are arranged in a grid. Such as this one in here 4 to 3 or 16 to 9 could be the resolution or 9 to 16 if you're on TikTok. And if you have a collection of frames in sequence, as you get in sequence, which I here illustrated like an analog film, the older ones of you might remember that. So what do we need this compression for? Why do we need lossy compression if we just want to put videos on the internet or record video and work with video? And I have a napkin computation here. If we have 3 channels, as I just said, and 3 channels per pixel, 8 bits per channel, that might be a standard resolution. And then we have on a full HD image, so 1920 x 1080 pixel and 25 images per second. Then you get 1.16 gibby bits per second, which is a lot of data in video editing that might still seem acceptable. But if you're streaming this over the internet, from media.ccc, for instance, then that's just not possible. TikTokcom is going to say nah. And this used to be a problem already before we started with full HD or with smaller videos. So, in this resolution, for 19 minutes, that's 6.2 tibbit bytes, that's way too much. For comparison, that is 1.400 DVDs with each 4.7 gibby bytes or 500.000 floppy disks with 1.4 millibytes. But before we talk about how we compress the video, we need to talk about what we use video for anyways. So, what do we use video for is supposed to be shown. And first of all, the classic area of use is cinema or analog television. And you usually have one cable to the screen, which is stable, and you have a hard disk with which the film is shown from, and that's it. In analog TV, you have one cable carrying several signals, several channels, but there is nothing else happening there. That's a bit different if you're online or TV at home, with video-on-demand platforms, like media.ccc, or live-streaming platforms, like streaming media.ccc. And there you have a less stable connection. It could be the internet, which has issues, or you're on the train and you want to stream from there. Another area where it's used is video calls, which we probably all know now thanks to the pandemic, and it was already important before. And here it's very important that you have very low latency so that people don't talk over each other. And that the video and audio also are transferred well. And it's clear, especially due to the pandemic, that it's not ideal and it's not that terrible if the video isn't perfect or the quality suffers a bit. But the important thing is it's there and it is simultaneous to the audio and shown simultaneously with the audio. And one case that isn't so popular in the chaos bubble, but I still want to talk about it, is surveillance cameras, they aren't usually alone. Here you can see three, and they just film 24x7. And there's a lot of data coming together there, even though most of the time or maybe just an empty Space is being filmed. And we want to be able to compress this well because we don't want to watch a cinematic movie here. We just want to see someone is walking from A to B or this is the face of a person or something like that. And the use cases of video compression are accordingly with surveillance cameras totally different from a cinema movie. And the last point I want to mention is image recognition, which is used in medicine to spot strange things and if there is a problem, if the compression moves things around or makes things unreadable and then you might not be able to see a fracture or a brain tumor anymore. And now that we have cleared this up we can talk about how does video compression work. So, the first thing you might think about when thinking about video compression is that you just have less pixels by reducing the resolution. And this works pretty well. We have here four greenish pixels and four bluish pixels and we compress those down to one green and one blue one. And now we have saved 75% of space. But if we now show these on the large display again it's not really clear how we're supposed to show this data. We could scale up those pixels again but then we have a large area which has a sharply defined edge and within it there's no internal structure. So that looks a bit like if you're painting with colors and that's not really what you want. The next option would be that you combine the pixels from their neighbors. You could do this in different kinds of difficulty. Here we have a linear gradient but then you have this gray area in the middle which comes from interpolating in the RGB space and you can do this in more complicated ways such as we have here only two points because we have a two-dimensional image. Usually you have a two-dimensional image and then you have four points to interpolate from. And here there's the formula for it which is just linear versions of the two pixel data. And the developers of the video products make some assumptions about the video material that we want to encode and how it's perceived by humans. If you trust these assumptions then you can turn them into technical ideas and it has worked very well now so I want to talk about some of these assumptions. The first assumption is that the human eye can resolve differences in brightness better than differences in hue. If we can separate the hue from the brightness of a pixel then we can save the hue data with less resolution and save bandwidth. So far we've been talking about the RGB color space so red, green, blue. And with this new color space YCBCR there's a separate channel for the brightness and then two for the hue and brightness can go from zero to one and the blue and red from minus one to plus one and then you can see this skewed cuboid there. And if you put blue and red on a minimal value and you get green and that is the reason why VLC player will show green pixels on transmission errors so green pixels can mean transmission errors or something didn't work in decoding. And the idea comes from the switching from analog black and white TV to color TV so the older TVs just took the brightness signal out of this thing and then the newer TVs took the hue information out as well and showed it. And now to have the color information or hue information in less resolution you take this chroma subsampling and I've shown the most common ones here at this beginning we have four which are sampled out to two or just one pixels and the most common node is four to zero which is the one on the right where for one hue or chroma value you have four brightness values and this is used in the MPEG codec and that's a standard in almost all other lossy video codecs and if you look at the number of blocks then you can see at four to zero you use six bytes and with 444 you would use 12 bytes and that's great because that's cutting down a number of pixels in half. The next assumption is that the image has a foreground and a background and which means there are maybe areas which have a rougher and finer resolution to represent the image and there's a mode here that is called creating blocks and so you split the image into several blocks the JPEG, MPEG and H264 have a fixed block size and H264, H265 has a variable block size which you can see on the right there the blocks are turned smaller until they don't contain much structure anymore and then you have the structure, the resolution more reduced along this edge than everywhere so here I should I would like to be able to say something about the resolution but don't fit into these 20 minutes here. The next assumption is that consecutive frames will have the same or very similar content and this means we can transfer just the things that changed and used by span-with and this principle is called differential image or a delta frame and a delta frame just represents the difference to the previous frame and this allows skipping so if you skip ahead in the video file then the player usually skips to the next keyframe and then continues playing from there and series of one keyframe and then multiple frames it is called a group of pictures and there are these intro frames and then there are predictive frames which just have the difference we can see the test video with the scroll-run where the first frame we have the first and second frame and then the difference between them and you can see there is not that much changing between them the tree almost doesn't change at all for instance and we don't have to transfer this so we can say for the next 100 pixels there or in this block there is no change the next assumption is that a sequence represents a movement so if we draw a block around an object then we can just transfer the movement or translation of this block and there are two possibilities here one is global motion compensation and one is block-based motion compensation with global one you move or scale around the whole image which is great if you have movements like zooming in, zooming out but the technology is not used that much anymore so if you have an encoder and it doesn't perform as good as you would expect the block-motion compensation operates on the blocks that we've already talked about each block is searched within a certain radius in the next frame and then the position with the smallest difference is used as vector and then that area is going to be filled through another block so the radius of searching for the other one can be on a sub-pickle basis but that will always depend on the codec and one disadvantage of this movement compensation is that they are always in uneven areas at the borders of the block so then you can see that looks a bit strange at the movement especially if there are so many but so far but now I want to look forward to H266 or VVC which is the same standard and it's developed by the ICO and MPEG so motion picture experts group they develop that together and they have two different name schemas so that's why the standard has two different names so H266 looks to be used i.e. we have to do a simple to use 50% less bitrate and the same subjective quality compared to the processor so the video that looks equally good should now be 50% smaller in traffic or bitrate compared to H265 Ich habe nur ein paar von ihnen gepackt. Das ist nicht 5, das ist nur 4. Sie haben die Blocken verändert. Die Breiteness-Kanäle können jetzt anders verteilt werden als die Chroma-Kanäle. Das ist toll, wenn du eine Kanal hast, die nicht in Breiteness, sondern nur in Chroma, sondern nicht in Chroma, sondern in Breiteness. Der andere Punkt ist, dass es virtuale Bildungsbordern gibt. Hier haben wir 360-Degree-Videos, die in der Mitte dieser Ecke haben, weil die Bildung von 6 verschiedenen Kameras, also siehst du zusammen, da ist kein Überlebnis, da ist kein Bewegung von einer Art zu der anderen. So, das kreiert ein forced Block der Bordere. Ein weiterer Punkt ist die Pellet-Mode. Wir brauchen nicht alle die Farben, aber wir brauchen eine hohe Resolution, um kleine Texte zu erreichen. Auf dieser Seite haben wir einen weißen, schwarzen und ein bisschen gelb, aber wir brauchen nicht das ganze RGB-Kolorspac, oder IUV, oder so weiter. In der Pellet-Mode ist es genug, um die Samples zu setzen, um kleine Verteidigung zu machen, und dann weniger Bandwidth zu benutzen, für die gleichen Verteidigung. Wo ist die H266 genutzt? Die Standard ist verabschiedet oder beendet seit 2020. Es gibt eine Version 1 des ersten Inkoders seit 2021. Wir haben nur sehr wenig Support in den usualen Software, wie FFM-Pack, VLC-Player oder any video editing software. Das bedeutet, dass es mehr Zeit gibt, aber man kann FFM-Pack mit der H266-Encoder sein. Ein weiterer Punkt ist die Co-Prozessors. Es gibt keine Smartphone, ein Laptop-Computer, eine Co-Prozessor für Videoencoding und Decoding, und diese gibt es noch nicht, so dass es mehr Zeit gibt, um den Markt zu erreichen. Die Adoptionphase von H265, zum Beispiel für 4, 5, 6, 7 Jahre, die Standard von H265 war verabschiedet in 2013 und in 2017. Das ist der H265 für Videos by default. Das war es für meine Präsentation. Ich habe viele Image-Sources genutzt, die ich von Webseiten oder Papers verabschiedet habe. Ich werde die ganze Präsentation online aufnehmen und dann klicken Sie auf die Links. Hier möchte ich sagen, vielen Dank für Ihre Aufmerksamkeit. Wenn Sie eine Feedback haben, dann würde ich Sie auf die Links beenden. Vielen Dank, Janik. Ich freue mich, Sie auf die Videos-Kompression zu verabschieden. Vielen Dank, Janik. Vielen Dank, Janik. Wir haben ein kleiner Breakout-Session in der FEMM-Assembli. Befreien Sie uns auf der RC3-World an der FEMM-Assembli zu besuchen. Am 10.00 Uhr werden wir eine Schema-Hau-Verfügung von Votingen in den Elections möglich sein. Und dann am Mittwoch, wie normal, die Harriet News Show. Oh, oh, die Harriet News Show, meine Apologies. Ah, ja, so, das war die Translation für die Talkbasis von Yannick, die von Lukas verwendet wurde. Und wenn Sie uns die Feedback für uns haben, dann benutzt die Hashtag C3-Lingo auf Social Media. Danke!