High-frequency content in MP3s

From Hydrogenaudio Knowledgebase
Revision as of 16:07, 10 June 2021 by 82.51.138.162 (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The amount of high-frequency content in an MP3 cannot be used to reliably predict the quality of the MP3.[1] Therefore, Hydrogenaudio users advise against using frequency graphs to assess MP3 quality; "we don't listen with our eyes".

MP3 quality can only be assessed by acoustic testing, such as by measuring transparency with ABX testing.

Why MP3s omit high-frequency content

MP3 encoders typically "lowpass" their input, cutting off the highest frequencies. There are several reasons for this:

  • Human hearing tends to drop off sharply somewhere between 16 and 20 KHz, usually toward the low end of that range.
  • The input to the encoder is unlikely to have much, if any, audible, musical (non-noise-like) high-frequency content.
  • Even if the source does have audible musical content above 16 KHz, preserving it would take away valuable space that could be used by the lower, more important frequency bands.
  • The MP3 format has difficulty storing content above 16 KHz without sacrificing quality and increasing the bitrate requirements of the lower frequency bands.

Details follow.

The limits of human hearing

Human hearing sensitivity peaks at 1 or 2 KHz, and drops from there. Children and young people under 20 years old can't hear above about 20 KHz at all, and this upper limit decreases with age. Many people can't hear anything above ~18 KHz at all, even test tones. This affords an opportunity to limit the frequencies an MP3 encoder cares about to just those that humans can hear.

The point of lossy formats like MP3 is to achieve transparency while saving space, with sacrifices made in ways that change the audio in ways that are minimally audible. Removing what is likely to be ultrasonic content is an effective way to achieve that goal. If you're not concerned about saving space, or you are determined to erroneously regard any sacrifices to the audio to be a risk to or reduction in quality, then for the peace of mind, you should not be using MP3 at all; rather, you should use a lossless format.

Limited high-frequency content in music

The characteristics of music present further reasons to lowpass the input to an MP3 encoder.

Just as you can't hear a mosquito buzzing when firing a cannon, quiet sounds are masked by louder ones. The Wikipedia article on auditory masking explains this in greater detail.

Musical instruments produce sound within the range of 40 Hz to about 16 KHz. Generally, each instrument produces a loud, relatively low-frequency fundamental tone, accompanied by numerous quieter overtones at higher frequencies. Although a few instruments (cymbals, trumpets) may produce overtones at higher frequencies, there is so little acoustic energy above 16 KHz, those tones tend to be masked by the much louder sounds at lower frequencies. Consequently, most people can't distinguish music that's missing frequencies above 16 KHz from music that isn't. When the difference is noticeable, it tends to be only in loud transients, such as percussion hits. Accordingly, well-designed MP3 encoders allow high-frequency content through only when it's sufficiently loud or would not be masked.

High-frequency noise is problematic

For various reasons, electric and electronic instruments, as well as analog recording equipment and mixing consoles, typically impart electrical noise and other "hiss" which, while quiet and often masked, extends well into ultrasonic frequencies. Such noise is a relatively complex signal which can be difficult and wasteful to encode in MP3s. Lowpass filtering frees up the encoder to devote more space and quality to the lower frequencies you can hear and to which your ears are far more sensitive.

It mustn't be assumed that high-frequency content is musical, or that the presence of such content would make a listener with sensitive hearing report higher quality.

The scalefactor band 21 problem

The MP3 format has a technical limitation that forces a trade-off: the more accurately the highest frequency band (16 KHz and up, normally) is encoded, the greater the space required to encode the lower frequency bands with similar quality. In other words, if the highest frequencies are preserved well, the quality of the much-more audible lower frequencies is sacrificed, and the bitrate has to be increased significantly to compensate, and it might not always be possible to fully compensate because of the 320 kbps limit on bitrate.

The LAME Y switch article explains this "sfb21 defect" in greater detail, as does LAME developer Gabriel Bouvigne's MP3 Limitations document.

A well-designed MP3 encoder will be judicious in its handling of sfb21, only encoding high-frequency content when it's possible to do so without a significant adverse impact on the quality of the lower frequencies.

Notes

  1. In fact, any visible differences observed in spectrograms (a graph of frequency content over time) are not reliable.