Joint stereo

From Hydrogenaudio Knowledgebase

Joint stereo is a property of an audio data stream and means that the stream supports more than one method of stereo coding, such as SS ("simple" or "L/R" stereo), MS ("mid-side" stereo), or IS ("intensity" stereo). A joint stereo stream may still only employ a single coding method, but for the sake of efficiency or quality may switch between methods on a frame or even sub-frame basis.

For example, a high-bitrate "joint stereo" MP3 file may contain a mixture of SS and MS frames, or it may contain all SS frames or all MS frames. A non-"joint stereo" MP3 will never contain a mixture of frame types.

Very few MP3 encoders make ideal decisions about what mode to use from frame to frame in joint stereo files, so there has long been a misconception that joint stereo means too many (perhaps 100%) mid-side frames, which is rarely desirable in stereo music and can result in haphazard channel separation and/or suboptimal quality as measured by other criteria.

"Joint stereo coding methods" in prose generally refers to whatever alternatives to simple (L/R) stereo coding are supported by a particular format, even though simple stereo is also an option.

Stereo coding methods or "modes"

Left-Right (L/R) or "Simple" Stereo (SS)

Simple stereo is the most straightforward method of coding a stereo signal: each channel is treated as a completely separate entity. This can be inefficient and may adversely impact quality (as compared to other modes) when both channels contain nearly identical signals (i.e., are mono or nearly so).

Mid-side Stereo (MS)

Mid-side stereo coding calculates a "mid"-channel by addition of left and right channel, and a "side"-channel, i.e.:





Whenever a signal is concentrated in the middle of the stereo image (i.e. more mono-like), mid-side stereo can achieve a significant saving in bitrate, since one can use fewer bits to encode the side-channel. Even more important is the fact that by applying the inverse matrix in the decoder, the quantization noise becomes correlated and falls in the middle of the stereo image, where it is masked by the signal.

Unlike intensity stereo which destroys phase information, mid-side coding keeps the phase information pretty much intact. Correctly implemented mid-side stereo does very little or no damage to the stereo image and increases compression efficiency either by reducing size or increasing overall quality.

Intensity Stereo

Intensity stereo coding is a method that achieves a saving in bitrate by replacing the left and the right signal by a single representing signal plus directional information. This replacement is psychoacoustically justified in the higher frequency range since the human auditory system is insensitive to the signal phase at frequencies above approximately 2kHz.

Intensity stereo is by definition a lossy coding method thus it is primarily useful at low bitrates. For coding at higher bitrates only mid-side stereo should be used.


Additional information

Some more details about joint stereo & mid-side coding:

  • Bugs and/or not-optimized encoders may implement mid-side coding incorrectly, making mid-side coding sound worse than simple stereo, while in reality (see the formulas above) there should be no difference in quality between mid-side stereo and simple stereo.
  • Some older MP3 encoders interpret "joint stereo" to mean 100% mid-side, which for stereo music is almost certainly less than ideal.
  • Modern/optimized encoders will use mid-side coding or simple stereo coding as necessary, depending on the correlation between the left and right channels.

External Links