Psychoacoustic Noise Suppression

Psychoacoustic noise suppression (PNS) takes into account the physiological and acoustic properties of the human hearing organ in the design of the algorithm for improving the perceptual quality of the communication. The human ear has been formed to optimize the perception of human speech. Human speech has nearly all of its energy concentrated between 300 Hz and 7kHz. The pinna is responsible for directing the sound towards the outer ear canal and results in an increase of sound level around 10dB for frequencies between 1.5kHz and 7kHz, right in the range of human speech. Two characteristics that are important in the human hearing and in psychoacoustic motivated sound processing are loudness and masking effects.

The loudness characteristic is the fact that the perceived loudness is a function of frequency. In other words, two tones with different frequencies but with the same energy or sound pressure level (SPL) will be perceived as having different loudness. In general, lower frequencies require a higher SPL than higher frequencies in order to achieve the same loudness. Thanks to the design of the pinna, the range between 1.5kHz and 7kHz require the lowest SPLs of the entire frequency spectrum.

The masking effect describes the phenomenon where one signal masks a weaker signal and makes it inaudible. Although, the human hearing organ has a high resolution for detecting single tones (approximately 5 Hz), it does not perform as well in the presence of other tones. This is due to the spectral leakage in cochlea; the organ that is responsible for distinguishing frequencies. This phenomenon can be observed in the figure below. The strongest signal at 1000 Hz is considered the masking signal. The dotted line extending from this frequency represents the masking threshold. In order for another signal to be heard, it must contain enough energy to place it above this threshold. As it can be observed, signals closer to the masking signal require more energy than signals farther away, as the 1200 Hz will be rendered inaudible, while the 1500 Hz will be considered audible.

These concepts are used in psychoacoustic noise suppression. The main principle of PSN is “do not suppress noise below the level we can hear”. The reason for this is to prevent introducing distortions that are typical of noise reduction routines (e.g. musical noise). PSN works in the same manor as most noise suppression algorithms, but an additional conditional is put in to prevent suppression on signals that would be considered inaudible anyways. Although, the SNR improvement will decrease with psychoacoustic noise suppression, the subjective quality will be improved. Also note, that psychoacoustic based audio compressors should be used in conjunction with PSN. In attempt to reduce the number of bits, audio compression schemes throw out any information that will be considered inaudible, thus undermining the main principle of psychoacoustic noise suppressors.

More Information

Speech Enhancement Design

Complete Communications Engineering

More Information