H.264/AVC is a popular and efficient compression standard, and, as a result, the issue of digital watermarking that carries copyright, video tagging, control or other information for this standard is important. In general, a watermark could be an image (logo, copyright image, etc.), text or control information.

Before insertion into H.264 compressed video, the watermark should be preprocessed. The preprocessing depends on the watermark nature. For example, a small image watermark could be a sequence of binary values directly representing the image, or it could be the transform coding (e.g. DCT) of the watermark image followed by post processing (quantization, masking, entropy encoding etc.). The result of watermark preprocessing is a sequence of bits.

The H.264/AVC standard uses a 4×4 DCT which does not require multiplications. In many cases a digital watermark signal is embedded into the 4×4 transformed blocks of I-frames of the luma signal.

Any subset of the DCT blocks can be chosen for watermark embedding. In each of the chosen blocks, only one certain middle frequency in the diagonal positions is modified. The encoded watermark may be spread out before insertion so that each bit in the watermark sequence is repeated *s* times (where s is an integer number). Before insertion, the spread out bits may be permutated by a random sequence. Therefore, the output watermark sequence may be represented as:

*w*’_{i} = *p*_{i}∙* w*_{k}, for *ks* ≤ *i *< (*k*+1) *s* ,

where *w*_{k} is the input watermark sequence, *s* is the spreading factor and *p*_{i} is the permutation sequence. The chosen diagonal middle frequency is replaced by a value proportional to the watermark signal.

In some cases the value is multiplied by a constant coefficient that depends on the image region activity of the particular DCT block. For example, the gain factor should be decreased for smooth or bright regions since the human eye is more sensitive to distortions there.

In many cases the watermark signal is represented by values ±1. Therefore the chosen middle frequency sample in DCT transform is replaced by *H*_{i} = *w*_{i}∙*K*_{1}∙*K*_{2}, where *w* is the watermark signal (±1), *K*_{1} is the local coefficient depended on the DCT block activity and *K*_{2} is the global coefficient.

The watermark can be extracted with the chosen frequency coefficients extracted from particular DCT blocks, normalization by *K*_{1}, inverse permutation and adding together all coefficients that were the result from *w*_{k}spreading for the same *k*. The estimated watermark is equal to the following sum:

This algorithm is able to keep a good transparency level (good video quality) and robustness level.