 Video Compression Technique, MPHG-2 and MPHG-4 Standard This is the learning outcome of the session. Let me ask you one question. What are the two methods of the video compression? You can pause the video, think on the question, write down your answer in your notebook, return to the video, you can resume the review to get the answer. Now, let me answer the question. There are basically two methods for the video compression. The first is a spatial compression and another is a temporal compression. The objective of MPHG-2 Standard. The MPHG-2 Standard is basically designed for the compression, encoding and transmission of high quality of video over the multi-channel media for the terrestrial broadcasting or digital or cable TV distribution. Also, the compression is done for the broadband network in order to transfer the video over the internet. Defining a profile and levels are subset of syntax in order to suit the wide range of the application. And in order to scale the bit stream and for the error correction capability, the MPHG-2 Standard is developed. It has a backward compatibility with the MPHG-1. The MPHG-2 supports the file profiles in a decreasing order of the hierarchy. The first is high profile, second is a spatial scalable profile, third is a signal-to-noise ratio scalable profile, a main profile and a simple profile. Each profile adds a new set of algorithms and acts as a superset of the algorithm which is supported. A level specifies a range of the parameters which are supported by the implementation, that is image size, frame rate and bit rate. MPHG-2 supports four levels, high, 1,440, main and low. In case of the high profile, all the functionalities are provided by the spatial scalable profile plus it includes three layers of SNR and spatial scalable coding. It also includes 4S to 2S to 2YUV representation. In case of the spatial scalable profile, all functionalities provided by the signal-to-noise ratio scalable profile plus two layers of the spatial scalable coding and 4S to 1S to 1YUV representation. SNR scalable profile consists of all functionalities and two layers of the SNR scalable coding 4S to 2S to 0YUV representation. In case of the main profile, all functionalities are provided by the simple profile plus it includes interlaced coding in a video, random access, biopolar picture prediction modes and it supports 4S to 2S to 0YUV representation. Simple profile, it does not support a biopolar picture prediction. The levels, there are four levels, low, main, 1,440 and high. So, in case of the low resolution, the resolution is 352 by 288 and the maximum frame per seconds are 30 frames per second. Maximum coded data rate is 4 Mbps and it is used for the consumer tape recording. The main profile provides half HD resolution that is 720 by 576 and it includes the 30 frames per second with the data rate of 15 Mbps and it is used for the studio television. The main profile with the resolution of 1440 by 1152 supports 60 frames per second and the 60 Mbps data rate and it is used for the high definition television. And the high profile, it supports the full HD, it is a full resolution of 1920 by 1152 with the frames per second of the 60 and the data rate of 80 Mbps. It is used for the film production. What is the difference between MPG-2 and MPG-1? A sequence layer consists of the interlaced scanning and aspect ratio is about 16S to 9. A syntax can support a single frame size up to the 16,000 by 16,000 nearly. The picture must be multiple of the 16 pixel. In case of the picture layer, all MPG-2 motion vectors are always half pixel accurate whereas in case of the MPG-1, it provides the one pixel accuracy. A DC coefficient is decoded or can be coded as 8, 9, 10 or the 11 bit and in case of the MPG-1 always uses the 8 bit for the coding. It includes the non-linear macro block quantization which gives a more dynamic step size from the range of 0.5 to 56 whereas in case of the MPG-1 it supports 1 to 32 and it is good for the high rate and high quality video. It includes the interlaced scanning which is in which the odd and even lines are scanned separately. If the input is interlaced, the output of the encoder consists of sequence of fields that are separated by the one field period. The MPG-2 supports a two new picture format which is called as a frame picture and the field picture format. In case of the field picture, every field is coded separately. Every field is separated into non-overlapping macro block and a discrete cosine transform is applied on the field basis. In frame picture, two fields are coded together to form a frame which is similar to the conventional coding of the progressive video sequence. The frame pictures are preferred for the relatively still images whereas the field pictures gives better results in presence of significant motions and it is possible to switch between the frame picture and the field picture on the frame by frame basis. Each frame or field picture may be I type, P type or the B type. MPG-2 frame size. Generally, the I frame is of the size 50 kilobyte and the compression ratio is about 10 s to 1. The P frame that is a predictive frame is of the 25 kb size and in which compression ratio is 20 s to 1 and B picture frame rate that is bipolar frame is of the 10 kilobyte and the compression ratio about 15 s to 1 is used as PG-4. The MPG-4 coding is adopted on the object-based coding concept in which the arbitrarily shaped and dynamically changing individual object visual or audio objects in video sequence can be individually coded, manipulated and transmitted through the independent bit stream. Generally, the transmission speed is 5 to 64 kilobyte to the maximum of 2 megabits per second for the TV or the film application in MPG-4. The widespread application of the MPG-4 are in internet streaming, wireless video, digital video cameras as well as it is used in mobile phones and mobile palm computers. Next is the basic objective of MPG-4 standard. It supports for the content-based manipulation and bit stream editing. It has the ability to combine synthetic scenes or objects with the natural scenes and the objects. It has the provision of providing the efficiency for the random access of video frames or the objects. It provides the better visual quality as compared to the bit rates and as compared to its earlier standard that is MPG-1 and MPG-2. It has the ability to encode the multiple views for an example, a stereoscopic videos. It has a provision for error robustness to allow the access to the varieties of wireless and wired network and also the storage media. In MPG-4, the audio and video data are content-based which allows the independent access manipulation of audio-visual objects in the compressed environment. The transformation of the existing objects that is repositioning, scaling, rotations addition to the new objects, removal of existing objects, etc. are done in the case of the MPG-2 environment. The object manipulations are possible through the simple operation which is performed on the bit stream. The audio-visual objects are layered and each layer is encoded into the elementary stream of the bits. For the content representation using VOP, an input video sequence is segmented into the number of arbitrary shaped regions which are called as VOPs. Each of the regions possibly cover the particular image or the video content of the interest. The shape and the location of the region can vary from frame to frame. The shape, motion, texture, information of the VOP belonging to the same views is encoded and transmitted into video object layers. Since the typically there are several video objects in the bit stream should be also included the information on how to combine the different VOLs to reconstruct the video. The video sequence is composed of one or more audio-visual objects. A VO can be either audio objects resulting from the speech, music, sound effect, etc. or the video object that is VO representing a specific content such as moving object, static or moving background etc. Video object may be present over the largest collection of the frame. The snapshot of the video object in one frame is defined as a video object plane that is VOP. As shown in figure, the background is discarded in order to detect the video object. In order to get the video object, the video sequence is segmented into the arbitrary shaped foreground VOP and a background VOP2. And binary alpha plane for the same frame which is a binary segmentation mask specifying the location of the foreground content and the VOP. As shown in figure, we can locate this is the object and this is the background. So, the content based encoder and encoding and decoding consist of first as a scene is divided into a different object. From each object, the VOPs are identified and then encoded and transmitted over the network. And at the decoding side, the each bit streams are converted into the objects. These objects are combined in order to reconstruct the scene. Each VOL encoding has three components. Shape, that is shape coding, that is contour coding, motion estimation, texture coding. As shown in figure, this is the contour of the image. This is the contour macro block. So, here is the contour macro block which is present at the edge. This is the standard the macro block which is present inside the object. So, it is called as a standard of macro block. It doesn't define the edge. And this is the total VOP window. The VOP image window is composed of the macro block which is of the size 16 x 16 pixel. The macro blocks which do not belong to the VOP at all. These are interactive macro block with respect to the VOP and not encoded in VOL. The macro block which are partially belong to the VOP. These are the boundary macro block for the VOP and require some special consideration during the encoding. And the macro block which fully belong to the VOP are the standard macro block for the VOPs. These are the references for the session. Thank you.