Video compression technology is a set of techniques for reducing and removing redundancy in video data. The compressed video must have a much smaller size compared to the uncompressed video. This allows the video to be saved in a smaller file or sent over a network more quickly. The video compression efficiency is related to the video bitrate for a given resolution and framerate. The compression is more efficient if it results in lower bitrates.
Video compression may be lossy, in which case the image quality is reduced compared to the original image. For lossy compression, the goal is to develop compression techniques that are efficient and result in perceptually lossless quality. In effect, even though the compressed video is different from the original uncompressed video, the differences are not easily visible to the human eye.
Video data may be represented as a series of still frames, or fields for interlaced video. The sequence of frames will almost certainly contain both spatial and temporal redundancy that video compression algorithms can use. Most video compression algorithms use both spatial compression, based on redundancy within a single frame or field, and temporal compression, based on redundancy between different video frames.
Spatial compression techniques are based on still image compression. The most popular technique, which is adopted by many standards, is the transform technique. In this technique, the image is split into blocks and the transform is applied to each block. The result of the transform is scaled and quantized. The quantized data is compressed by a lossless entropy encoder and the output bitstream is formed from the result. The most popular transform algorithm is the Discrete Cosine Transform (DCT) or its modifications. There are many other algorithms for spatial compression such as wavelet transform, vector coding, fractal compression, etc.
Temporal compression can be a very powerful method. It works by comparing different frames in the video to each other. If the video contains areas without motion, the system can issue a short command that copies that part of the previous frame, bit-for-bit, into the next one. If some of the pixels are changed (moved, rotated, change brightness, etc.) with respect to the reference frame or frames, then a prediction technique can be applied. For each area in the current frame, the algorithm searches for a similar area in the previous frame or frames. If a similar area is found, it’s subtracted from the current area and the difference is encoded by the transform coder. The reference for the current frame area may also be obtained as a weighted sum of corresponding areas from previous and consecutive frames. If consecutive frames are used, then the current frame must be delayed by some number of frame periods.
There are a number of proprietary and industry video encoding standards. Almost all widely used standards are based on the DCT transform techniques. The most popular standards are shown in Table 1.
Table 1: Popular video compression standards
|MPEG-1 Part 2||ISO,IEC||1993|
|MPEG-2 Part 2, H.262||ISO,IEC,ITU-T||1995|
|H.264/AVC, MPEG-4 Part 10||Sony, Panasonic, Samsung, ISO, IEC, ITU-T||2003|