LAME Y switch

From Hydrogenaudio Knowledgebase
Revision as of 11:59, 31 March 2010 by JAZ (Talk | contribs)

Jump to: navigation, search

This article describes the function of the -Y switch in the LAME encoder commandline.

The short definition

  • The -Y switch tells LAME not to encode the highest frequencies accurately, if doing so causes disproportional increases in bitrate.


Other ways to say it include:

  • The -Y switch tells LAME to use a more coarse representation for the highest frequencies, in the parts where it would cause an over-encoding of all the other bands.
  • The -Y switch tells LAME to not be so strict with the higher frequencies, IF they are going to cause an increase of bitrate.


The -Y switch is not a lowpass filter.
It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to remove them anyway).


The technical definition

How is audio stored in MP3

  • MP3 audio is stored in the frequency domain (values for frequencies) instead of time domain (values for samples)
  • Frequencies are analyzed and stored in groups, known as bands.
  • Bands are quantized to make them compress better.
  • Scale factor refers to how much quantization (loss of precision) is applied to each band, where higher quantization causes greater compression, but also less variation between the minimum and maximum values (resolution).
  • Each band has its own scale factor, so that its quantization can be adjusted independently from the others.
  • Global gain is an extra quantizer that affects all bands simultaneously.

What is the scalefactor band 21 (sfb21) defect

  • The last band is called sfb21, and does not have a scale factor. This band stores frequencies at 16 kHz and above.
  • If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scalefactor of sfb21 alone, since there is no such scale factor.
  • The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization.
  • The encoder can reduce the global gain as long as it is above zero.
  • If global gain is zero, resolution will need to be increased (and quantization be lowered) on every other scale factor band.
  • The result is that unnecessary resolution is applied to every other band, so the bits used in all the other bands will increase and ultimately, the bitrate too.
  • The encoder is forced to increase in excess the bitrate of the file just so that the frequencies >= 16 kHz will be adequately quantized.

The -Y switch and the sfb21

LAME implements the -Y switch as a way to activate the alternate logic that CBR uses in respect of quantization noise in the sfb21 band.

  • The encoder determines the desired quantization noise within the sfbs. The scalefactors are choosen acording to these values.
  • If -Y switch is not used (either implicitely or explicitely), sfb21 gets evaluated and the global gain is set acordingly.
  • Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21.

The result is that all the 16 kHz + frequencies still get encoded, but the ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch.


The -Y switch and CBR/ABR

The -Y switch is used along with the VBR modes.

For CBR and ABR, the encoder uses -Y implicitely. Concretely, the encoder targets a given bitrate, and adjusts the quantization steps until that target is reached. Since the sfb21 does not have quantization, its quantization noise is not evaluated.


Motivation for this article

The article tries to clarify what the switch does and what it does not. Just like joint stereo it is frequently misinterpreted, and identified with a filter. In explaining what it does, in easy terms and in technical terms, the reader should get a better understanding of the motivation and the usage of such switch.


Notes

In MPEG1 (32, 44, 48Khz), the last scalefactor band is sfb21. In MPEG2 (16,22,24Khz), it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material.


See also

Description of the MPEG layer 3 format