Joint stereo: Difference between revisions

Revision as of 05:27, 16 July 2023

Joint stereo refers to any stereo-encoding method that goes beyond simple encoding as two independent channels ("simple" or "L/R" stereo or DualMono). These methods exploit the similarities between channels and typically allow for more bits to be effectively used, increasing audio quality for a given bitrate. They are, however, not guaranteed to be perfect and could instead cause audible artifacts (mostly on older encoders).

Some file formats, such as MP3, can do switch among these formats on-the-fly on a frame or sub-frame basis, for the sake of efficiency or quality. For example, a high-bitrate "joint stereo" MP3 file may contain a mixture of SS and MS frames, or it may contain all SS frames or all MS frames. Due to some historical accident, the term as applied in MP3 refers to a mixture of coding formats. In other words, a non-"joint stereo" MP3 will never contain a mixture of frame types.

Stereo coding methods or "modes"

Left-Right (L/R) or "Simple" Stereo (SS)

Simple stereo is the most straightforward method of coding a stereo signal: each channel is treated as a completely separate entity. This can be inefficient and may adversely impact quality (as compared to other modes) when both channels contain nearly identical signals (i.e., are mono or nearly so).

Mid-side Stereo (MS)

Mid-side stereo coding calculates a "mid"-channel by addition of left and right channel, and a "side"-channel by subtraction, i.e.:

Encoding: M = (L + R) / 2, S = (L - R) / 2
Decoding: L = M + S, R = M - S

Whenever a signal is concentrated in the middle of the stereo image (i.e. more mono-like), mid-side stereo can achieve a significant saving in bitrate, since one can use fewer bits to encode the side-channel. Even more important is the fact that by applying the inverse matrix in the decoder, the quantization noise becomes correlated and falls in the middle of the stereo image, where it is masked by the signal.

Unlike intensity stereo which destroys phase information, mid-side coding is mathematically lossless (although subsequent lossy compression may cause phase degredation). Correctly implemented mid-side stereo does very little or no damage to the stereo image and increases compression efficiency either by reducing size or increasing overall quality. Mid-side is also simple enough to be implemented in FM radio and stereophonic Vinyl.

Mid-side stereo can use coefficients other than 1 in encoding and decoding. Allowing different contributions from each channel allows the codec to adapt to off-balance sources and retain the bitrate savings. This extension is found in opus, where an angle can be encoded.^[1]

Intensity stereo

Intensity stereo coding is a method that achieves a saving in bitrate by replacing the left and the right signal by a single representing signal plus directional information (in the form of amplitude ratios for each frequency range). This replacement is psychoacoustically justified in the higher frequency range since the human auditory system is insensitive to the signal phase at frequencies above approximately 2 kHz.^[2] To maintain the justification, a codec may only apply intensity stereo to higher-frequency parameters.^[1]

Intensity stereo is by definition a lossy coding method thus it is primarily useful at low bitrates. For coding at higher bitrates only mid-side stereo should be used.

Parametric stereo

Parametric stereo, found in HE-AAC, is similar to intensity stereo, except that the directional information also includes phase and correlation. The phase information makes this algorithm also capable of keeping low frequency location cues (by inter-aural time differences), while the (de-)correlation information helps add ambience by synthesizing some difference between channels.^[3]

PS replaces a whole channel with only 2-3 kbit/s of side information. As a result, the remaining channel gets almost double the bitrate to use, so the quality gain can more than makes up for the lossiness of the process. It is not useful at high bitrate.

The phase aspect is covered by a few patents applied in 1997~2000 (EP1107232A3, EP0797324A2), which should have expired. The ambience part (EP1927266B1) will expire in 2026, so do not expect any new experimental codec to use it yet.

MPEG Surround and MPEG-H 3D Audio expand PS to surround and full spatial/ambisonics sound, respectively.

By format

MP3

MP3 supports dual-mono, M/S, and intensity methods. LAME does not support intensity stereo.

Some early MP3 encoders didn't make ideal decisions about what mode to use from frame to frame in joint stereo files, or how much bandwidth to allocate to encoding the side channel. This led to a widespread but mistaken belief that an abundance of M/S frames, or the use of joint stereo in general, always negatively impacts channel separation and other measures of audio quality. This is not an issue with modern encoders. Modern, optimized encoders will switch between mid-side coding or simple stereo coding as necessary, depending on the correlation between the left and right channels, and will allocate channel bandwidth appropriately to ensure the best mode is used for each frame.

LAME M/S is known to better preserve stereo image than dual-mono in most circumstances, given the same bitrate budget. See Lossy.

Vorbis

Vorbis treats stereo information with square polar mapping which is beneficial when the correlation between the left and right channels are strong (this can also be extended to multichannel coupling as well). In Vorbis, the spectrum of each channel is normalized against a floor function, which is a rough envelope of the actual spectrum. In the square polar mapping, the (stereo) phase is roughly defined as the difference between the normalized left and right amplitude of a given frequency component. If the original left and right channel are the same within a certain frequency band, apart from an overall scaling factor, then the normalized frequency spectrum is the same left and right and the stereo phase is zero over the whole frequency band. Note that in the context of polar mapping, the term 'phase' (here: 'stereo phase') has a very different meaning from the phase of a periodic wave. Unlike in the Fourier Transform, the Cosine Transform used in Vorbis and other encoders only provides amplitudes and no phases of the latter type.

Once the stereo information is represented in polar mapping as a magnitude and stereo phase, Vorbis can use three coupling methods:

Lossless coupling is equivalent to independent encoding of the two channels ('dual mono' in MP3), but with the benefit of additional space-saving. It does polar mapping/channel interleaving using the residue vectors.
In point stereo, the stereo phase is discarded completely. All the stereo information comes from the difference in the spectral floors for the left and right channels.
In phase stereo', the stereo phase is quantized, i.e. stored at a lower resolution. Especially above 4 kHz, the ear is not very sensitive to phase information. Phase stereo is not currently implemented in reference encoder due to complexity, but will be re-added again later on. Note that phase stereo should not be compared to intensity stereo in MP3 coding.

Ogg Vorbis uses lossless/point stereo coupling below -q 6. Lossless channel coupling is used for high bitrates entirely (-q 6 and up). This can be adjusted via an advanced-encode switch, but is not done for simplicity's sake.

External Links

References

↑ ^a ^b See e.g. https://web.archive.org/web/20180714000735/http://jmvalin.ca/papers/aes135_opus_celt.pdf, sections 4.5 [IS frequency], 4.5.1 [M/S angle]
↑ http://www.hydrogenaudio.org/forums/index.php?showtopic=1491&view=findpost&p=14091
↑ Purnhagen, Heiko (October 5–8, 2004). LOW COMPLEXITY PARAMETRIC STEREO CODING IN MPEG-4" (PDF). 7th International Conference on Digital Audio Effects: 163–168.

[opus-1] See e.g. https://web.archive.org/web/20180714000735/http://jmvalin.ca/papers/aes135_opus_celt.pdf, sections 4.5 [IS frequency], 4.5.1 [M/S angle]

[2] ttp://www.hydrogenaudio.org/forums/index.php?showtopic=1491&view=findpost&p=14091

[LC-M4-3] Purnhagen, Heiko (October 5–8, 2004). LOW COMPLEXITY PARAMETRIC STEREO CODING IN MPEG-4" (PDF). 7th International Conference on Digital Audio Effects: 163–168.

[1]

[2]

[3]

@@ Line 31: / Line 31: @@
 [[Parametric stereo]], found in HE-AAC, is similar to intensity stereo, except that the directional information also includes phase and correlation. The phase information makes this algorithm also capable of keeping low frequency location cues (by inter-aural time differences), while the (de-)correlation information helps add ambience by synthesizing some difference between channels.<ref name=LC-M4>Purnhagen, Heiko (October 5–8, 2004). [http://dafx.de/paper-archive/2004/P_163.PDF LOW COMPLEXITY PARAMETRIC STEREO CODING IN MPEG-4]" (PDF). 7th International Conference on Digital Audio Effects: 163–168.</ref>
-Parametric stereo is also lossy and mostly useful at low bitrates. It's only something that makes intensity stereo a little more bearable.
+PS replaces a whole channel with only 2-3 kbit/s of side information. As a result, the remaining channel gets almost double the bitrate to use, so the quality gain can more than makes up for the lossiness of the process. It is not useful at high bitrate.
 The phase aspect is covered by a few patents applied in 1997~2000 (EP1107232A3, EP0797324A2), which should have expired. The ambience part (EP1927266B1) will expire in 2026, so do not expect any new experimental codec to use it yet.
@@ Line 61: / Line 61: @@
 * [http://en.wikipedia.org/wiki/Joint_stereo joint stereo at Wikipedia]
 * [http://www.xiph.org/vorbis/doc/stereo.html Ogg Vorbis stereo-specific channel coupling] at xiph.org.
+* [http://www.codingtechnologies.com/products/paraSter.htm Parametric Stereo at Coding Technologies]
 ==References==
@@ Line 66: / Line 68: @@
 [[Category:Technical]]
+[[Category:Algorithms]]