ReplayGain 1.0 specification

From Hydrogenaudio Knowledgebase
Revision as of 21:12, 11 December 2010 by Notat (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Problem

Not all CDs sound equally loud. The perceived loudness of mp3s is even more variable. Whilst different musical moods require that some tracks should sound louder than others, the loudness of a given CD has more to do with the year of issue or the whim of the producer than the intended emotional effect. If we add to this chaos the inconsistent quality of mp3 encoding, it's no wonder that a random play through your music collection can have you leaping for the volume control every other track.

The solution

There is a remarkably simple solution to this annoyance, and that is to store the required replay gain for each track within the track. This concept is called "MetaData" – data about data. It's already possible to store the title, artist, and CD track number within an mp3 file using the ID3 standard. The later ID3v2 standard also incorporates the ability to store a track relative volume adjustment, which can be used to "fix" quiet or loud sounding mp3s.

However, there is no consistent standard by which to define the appropriate replay gain which mp3 encoders and players agree on, and no automatic way to set the volume adjustment for each track – until now.

The Replay Gain proposal sets out a simple way of calculating and representing the ideal replay gain for every track and album.


Equal Loudness Filter

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full scale sine wave at 1kHz sounds much louder than a full scale sine wave at 10kHz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation to the equal loudness curves (sometimes referred to as Fletcher-Munson curves).

RMS Energy Calculation

Next, the energy during each moment of the signal is determined by calculating the Root Mean Square of the waveform every 50ms.

Statistical Processing

Where the average energy level of a signal varies with time, the louder moments contribute most to our perception of overall loudness. For example, in human speech, over half the time is silence, but this does not affect the perceived loudness of the talker at all! For this reason, the RMS values are sorted into numerical order, and the value 5% down the list is chosen to represent the overall perceived loudness of the signal.

Calibration with reference level

A suitable average replay level is 83dB SPL. A calibration relating the energy of a digital signal to the real world replay level has been defined by the SMPTE. Using this calibration, we subtract the current signal from the desired (calibrated) level to give the difference. We store this difference in the audio file.

Replay Gain

The calibration level of 83dB can be added to the difference from the previous calculation, to yield the actual Replay Gain. NOTE: we store the differential, NOT the actual Replay Gain.