 Video Compression – Learning Outcome of the session Before moving to the session, let me ask you one question. What are the popular video codecs which are used for the videos for computer, DVDs and VCDs? You can pause the video, think on the question and write down your answer in the notebook and then come back and resume the video. Let me tell you the answer. The general formats which are used for the video are AVI, MP4, WMV, MKV, MOV and FLV. Video Compression – So the video is nothing but a sequence of the rapid motion of the set of frames. Each frame is nothing but a digitized picture. So generally the video consists of 60 frames per second. So for the 3 minute video it will be having 180 frames. So the compression is necessary for the transmission of the video over the internet or the network. As well as if the video is compressed it will occupy the less memory into the storage devices. So the video compression is a process of encoding a video file in such a way that it consumes a less space than the original file. So in other words the frame is nothing but a spatial combination of pixels and a video is a temporal combination of frames that are sent over one after another. Compressing the video means that a spatially compressing each frame and the temporally compressing set of frames. So it can be done by eliminating the redundant and non-functional data from the original video. So let us see what is a spatial compression. Reducing the video file size by compressing the pixels which is present in each frame independently is called as a spatial compression. This type of the compression is also called as intra-frame method. Each frame is the picture that can be independently compressed. So the spatial redundant elements are nothing but the duplicated elements within a structure such as a pixels in a still image and beat patterns in the files. These are removed for the spatial compression. A spatial compression of each frame is done by using a JPEG compression which is nothing but joint photographic expert group. A temporal compression. A temporal compression the redundant frames are removed. In practice only the small portion of the each frame is involved in the motion. So there is a redundancy present in between the each frame. Hence by sending the only information relating to those each frames that have a moment associated with them will save the data bandwidth. Video compression principles. The technique that is used in order to do the video compression exploit high correlation between the successive frames. Is it is to predict the content of the main frame. Now this can be done by the combination of preceding and some instances a succeeding frames. Instead of sending the source video as a set of individually compressed frames just the selection is sent and for the remaining frames only the difference between the actual frame contents and the predicted frame contents are sent. Let us see the MPEG1 compression standard. The MPEG1 compression standard stands for motion picture expert group which is finalized in 1991. The MPEG1 standard was primarily used for the multimedia CD-ROM applications at the bit rate of 1.5 megabits per second. The standard is generic in sense. That is it specify a syntax for the representation of encoded bit stream and the method of decoding. The standard support the operation such as motion estimation, motion compensated prediction, discrete cosine transform, quantization, variable length coding. The standard supports the number of parameters that can be specified in bit stream itself. And the varieties of picture sizes aspect ratios etc are permissible. The parameters which are supported by the MPEG1 standards are the maximum number of pixels or the lines are 720. The maximum number of lines or the pictures are 576. The maximum number of pictures per second that is a frame rate is 30. The maximum number of macro block are 396. The maximum number of macro block per second is 9900 per second. The maximum bit rate is 1.86 megabit per second. And the maximum decoder buffer size is 376832 bits. The frames or the pictures which are present in MPEG1. The first frame is the intra coded frame. Now the intra coded frame is nothing but the entire image. These pictures are coded without reference of to the other picture in the video sequence. It means that the intra coded frame will first come into the encoding. The i-frames are least compressible but don't require any video frames in order to decode. Each frame is treated as a separate picture and the y, cb and cr matrices are encoded independently using the JPEG algorithm. Typically it requires more bits in order to encode than that of the other frames. In MPEG1 standards frames are encoded as a i-picture at regular intervals in order to enforce the updating with the current content. The next frame is the p-frame which is also called as a predicted picture. It holds only the information that the changes in the image from the previous frame. For example if we have one scene in which a car moves across the stationary background. So we have to capture only the movements of the car. And hence in the predicted picture or in p-frame the only the movement of the car is encoded. The encoder doesn't need in order to store the unchanging background pixels in the p-frame which will save the data. The p-frames are also called as a delta frames. It requires a prior decoding of some other picture in order to decode the p-frame. It may contain both image data and the motion vector displacement and the combination of the both. So the p-frame can reference previous picture in decoding order. Typically it requires fewer bit for the encoding than that of the i-picture. A background is a stationary but the object moves in front of the background. So because of this what happens we can't effectively code the background. For an example in this picture the car is moving from this point here to this point. Now during this moment when the car is present in this point we don't have the information about the background. And we have to rely on to the future frame. So whether it is possible we can decode the frame from the future. That is very difficult in order to code the frame from the future. So the previous frame doesn't contain any information. So this is a problem which occurs in case of the p-frame. And in order to remove this problem we can use the bi-directional predicted frames. The bi-directional predicted frames have the best compression performance. They use the bi-directional motion estimation with the reference to the nearest coded i-frame and or p-frame on either side of the b-picture in a temporal order. To achieve the high compression ratio in the encoded bit stream the most of the frames in the video sequence are coded as a b-frame. They typically requires fewer bits for the encoding than that of the i-frame or p-frame. Here shows you how the b-frame is formed. We have the past frame, current frame and the future frame in which we have the motion from this to this direction. So in future this motion will come to here and in current frame we have the motion in this here. So in case of the b-frame this the past frame is added with the future frame. The average is calculated and that is subtracted from the current frame. Also the motion vectors are added because of that we will have a less number of information which is to be encoded. The hierarchical structure in the MPEG-1. The data structure is shown by this diagram. First we have the video sequence then we have the group of pictures and then each group of the picture consists of i-frame, b-frame followed by the p-frame in this way the combination of i, b and p-frames are available. After that each picture is sliced into the macro blocks. At the top of the data structure we have a video sequence which consists of the several groups of the pictures. In the next level each group of picture begins with the i-frame and taking as a reference p-picture is encoded n-frames later than that of the i-picture in a temporal order. So, on the basis of that i-picture a prediction is done and then the p-pictures are created or encoded. The n-frames in between the frames are encoded as a b-picture. So, by taking the reference of i-picture p-picture the b-pictures are produced in between these two frames. The p-picture in turn predict the next p-picture that occurs n-frames later. Typically n is a smaller number which is of the three to four frames. Larger the value of n will introduce a more b-pictures and we may therefore think that we would be more efficient in terms of the compression. However, the larger value of the n makes the prediction worse in a generation of p-picture in which in turn it requires more number of bits which is to be encoded in the prediction error. Typically the value of n is in between three to four for the group of p-picture. In this way the data stream is transmitted. So, what are the macro blocks? At the next level of hierarchy the pictures are composed of the slices which are essential sequence of the macro blocks in the raster scan order and are designed for the recovery of the error. Each micro block is composed of 16 cross 16 pixels of the luminance and the one block of the 8 cross 8 pixel from each of the CR and CB channels as shown in the figure. So, this is a luminance channel and these two are the CR and CB channels which are 8 cross 8 pixels. The group of 16 cross 16 luminance pictures are further divided into four blocks of 8 cross 8 pixels. So, these are the references for the session. Thank you.