Joint stereo refers to any stereo-encoding method that goes beyond simple encoding as two independent channels ("simple" or "L/R" stereo or DualMono). These methods exploit the similarities between channels and typically allow for more bits to be effectively used, increasing audio quality for a given bitrate. They are, however, not guaranteed to be perfect and could instead cause audible artifacts.
Some file formats, such as MP3, can do switch among these formats on-the-fly on a frame or sub-frame basis, for the sake of efficiency or quality. For example, a high-bitrate "joint stereo" MP3 file may contain a mixture of SS and MS frames, or it may contain all SS frames or all MS frames. Due to some historical accident, the term as applied in MP3 refers to a mixture of coding formats. In other words, a non-"joint stereo" MP3 will never contain a mixture of frame types.
Stereo coding methods or "modes"
Left-Right (L/R) or "Simple" Stereo (SS)
Simple stereo is the most straightforward method of coding a stereo signal: each channel is treated as a completely separate entity. This can be inefficient and may adversely impact quality (as compared to other modes) when both channels contain nearly identical signals (i.e., are mono or nearly so).
Mid-side Stereo (MS)
Mid-side stereo coding calculates a "mid"-channel by addition of left and right channel, and a "side"-channel by subtraction, i.e.:
- M = (L + R) / 2, S = (L - R) / 2
- L = M + S, R = M - S
Whenever a signal is concentrated in the middle of the stereo image (i.e. more mono-like), mid-side stereo can achieve a significant saving in bitrate, since one can use fewer bits to encode the side-channel. Even more important is the fact that by applying the inverse matrix in the decoder, the quantization noise becomes correlated and falls in the middle of the stereo image, where it is masked by the signal.
Unlike intensity stereo which destroys phase information, mid-side coding is mathematically lossless (although subsequent lossy compression may cause phase degredation). Correctly implemented mid-side stereo does very little or no damage to the stereo image and increases compression efficiency either by reducing size or increasing overall quality. Mid-side is also simple enough to be implemented in FM radio and stereophonic Vinyl.
Mid-side stereo can use coefficients other than 1 in encoding and decoding. Allowing different contributions from each channel allows the codec to adapt to off-balance sources and retain the bitrate savings. This extension is found in opus, where an angle can be encoded.
Intensity stereo coding is a method that achieves a saving in bitrate by replacing the left and the right signal by a single representing signal plus directional information (in the form of amplitude ratios for each frequency range). This replacement is psychoacoustically justified in the higher frequency range since the human auditory system is insensitive to the signal phase at frequencies above approximately 2 kHz. To maintain the justification, a codec may only apply intensity stereo to higher-frequency parameters.
Intensity stereo is by definition a lossy coding method thus it is primarily useful at low bitrates. For coding at higher bitrates only mid-side stereo should be used.
Parametric stereo, found in HE-AAC, is similar to intensity stereo, except that the directional information also includes phase and correlation. The phase information makes this algorithm also capable of keeping low frequency location cues (by inter-aural time differences), while the (de-)correlation information helps add ambience by synthesizing some difference between channels.
Parametric stereo is also lossy and mostly useful at low bitrates. It's only something that makes intensity stereo a little more bearable.
The phase aspect is covered by a few patents applied in 1997~2000 (EP1107232A3, EP0797324A2), which should have expired. The ambience part (EP1927266B1) will expire in 2026, so do not expect any new experimental codec to use it yet.
Some early MP3 encoders didn't make ideal decisions about what mode to use from frame to frame in joint stereo files, or how much bandwidth to allocate to encoding the side channel. This led to a widespread but mistaken belief that an abundance of M/S frames, or the use of joint stereo in general, always negatively impacts channel separation and other measures of audio quality. This is not an issue with modern encoders. Modern, optimized encoders will switch between mid-side coding or simple stereo coding as necessary, depending on the correlation between the left and right channels, and will allocate channel bandwidth appropriately to ensure the best mode is used for each frame.
- See e.g. https://web.archive.org/web/20180714000735/http://jmvalin.ca/papers/aes135_opus_celt.pdf, sections 4.5 [IS frequency], 4.5.1 [M/S angle]
- Purnhagen, Heiko (October 5–8, 2004). LOW COMPLEXITY PARAMETRIC STEREO CODING IN MPEG-4" (PDF). 7th International Conference on Digital Audio Effects: 163–168.