Difference between revisions of "LAME Y switch"

From Hydrogenaudio Knowledgebase
Jump to: navigation, search
m (LAME Y SWITH moved to LAME Y SWITCH: typo.)
(Orthographic revision, extension on some points.)
Line 7: Line 7:
 
Other ways to say it include:
 
Other ways to say it include:
 
* The -Y switch tells [[LAME]] to use a more '''coarse representation''' for the highest frequencies, in the parts where it would cause an over-encoding of all the other bands.
 
* The -Y switch tells [[LAME]] to use a more '''coarse representation''' for the highest frequencies, in the parts where it would cause an over-encoding of all the other bands.
* The -Y switch tells [[LAME]] to '''not be so strict''' with the higher frequencies, '''IF''' they are going to cause an increase of bitrate.
+
* The -Y switch tells [[LAME]] to '''not be so strict''' with the higher frequencies, '''if''' they are going to cause an increase of bitrate.
  
  
 
; The -Y switch is not a lowpass filter.
 
; The -Y switch is not a lowpass filter.
: It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to remove them anyway).
+
: It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to simply remove them instead).
  
  
Line 17: Line 17:
  
 
===How is audio stored in MP3===
 
===How is audio stored in MP3===
* MP3 audio is stored in the frequency domain (values for frequencies) instead of time domain (values for samples)
+
* MP3 audio is stored in the frequency domain (values for frequencies) instead of the time domain (values for samples)
 
* Frequencies are analyzed and stored in groups, known as bands.
 
* Frequencies are analyzed and stored in groups, known as bands.
 
* Bands are quantized to make them compress better.
 
* Bands are quantized to make them compress better.
* ''Scale factor'' refers to how much quantization (loss of precision) is applied to each band, where higher quantization causes greater compression, but also less variation between the minimum and maximum values (resolution).
+
* ''Scale factor'' refers to how much quantization (loss of precision) is applied to each band, where higher quantization causes greater compression, and consequently less variation between the minimum and maximum values (resolution).
 
* Each band has its own scale factor, so that its quantization can be adjusted independently from the others.
 
* Each band has its own scale factor, so that its quantization can be adjusted independently from the others.
 +
*The exception is scalefactor band 21 (sfb21), which does not have a scale factor. This band stores frequencies of 16 kHz and above.
 
* Global gain is an extra quantizer that affects all bands simultaneously.
 
* Global gain is an extra quantizer that affects all bands simultaneously.
 +
(See section notes about scalefactors and global gain)
  
 
===What is the scalefactor band 21 (sfb21) defect===
 
===What is the scalefactor band 21 (sfb21) defect===
* The last band is called '''sfb21''', and '''does not have''' a scale factor. This band stores frequencies at 16 kHz and above.
+
* If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scale factor of sfb21 alone, since there is no such scale factor.
* If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scalefactor of sfb21 alone, since there is no such scale factor.
+
* The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization, since global gain applies to all bands.
* The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization.
+
 
* The encoder can reduce the global gain as long as it is above zero.
 
* The encoder can reduce the global gain as long as it is above zero.
* If global gain is zero, resolution will need to be increased (and quantization be lowered) '''on every other scale factor band'''.
+
* If global gain is zero, resolution will need to be increased (and quantization lowered) '''on every other scale factor band'''.
* The result is that unnecessary resolution is applied to every other band, so the bits used in all the other bands will increase and ultimately, the bitrate too.
+
* The result is that unnecessary resolution is applied to every other band, so the bits used in all the other bands will increase, causing the bitrate to rise.
* The encoder is forced to increase in excess the bitrate of the file just so that the frequencies >= 16 kHz will be adequately quantized.
+
* The encoder is forced to excessively increase the bitrate of the file just so that the frequencies >= 16 kHz will be adequately quantized.  
  
 
===The -Y switch and the sfb21 ===
 
===The -Y switch and the sfb21 ===
[[LAME]] implements the -Y switch as a way to activate the alternate logic that CBR uses in respect of quantization noise in the sfb21 band.
+
[[LAME]] implements the -Y switch as a way to activate the alternate logic that CBR uses in respect to quantization noise in the sfb21 band.
* The encoder determines the desired quantization noise within the sfbs. The scalefactors are choosen acording to these values.
+
* The encoder determines the desired quantization noise within the sfbs. The scale factors are choosen acording to these values.
* If -Y switch is not used (either implicitely or explicitely), sfb21 gets evaluated and the global gain is set acordingly.
+
* If -Y switch is not used (either implicitly or explicitly), sfb21 gets evaluated and the global gain is set acordingly.
 
* Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21.
 
* Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21.
  
The result is that all the 16 kHz + frequencies still get encoded, but the ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch.
+
The result is that all the 16 kHz and above frequencies still get encoded.
 +
 
 +
The ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch. '''The Y switch prevents global gain quantization from being decreased solely to accomodate the needs of sfb21'''.
  
  
 
==The -Y switch and CBR/ABR ==
 
==The -Y switch and CBR/ABR ==
The -Y switch is used along with the [[LAME#Technical information|VBR modes]].
+
The -Y switch can only be activated in [[LAME#Technical information|VBR mode]]. By default, -V 3 to -V 9 use -Y. -V 0, -V 1, and -V 2 do not. Consequently, adding -Y is only useful for the highest three VBR settings.
 +
 
 +
This is because in CBR and ABR modes, the encoder uses -Y implicitly.
 +
Specifically, LAME targets a given bitrate, and adjusts the quantization steps until that target is reached.
  
For CBR and ABR, the encoder uses -Y implicitely.
 
Concretely, the encoder targets a given bitrate, and adjusts the quantization steps until that target is reached.
 
 
Since the sfb21 does not have quantization, its quantization noise is not evaluated.
 
Since the sfb21 does not have quantization, its quantization noise is not evaluated.
  
 +
This is the same treatment as using -Y in VBR mode.
  
  
==Motivation under this article==
+
 
The article tries to clarify what the switch does and what it does not. It is frequently misinterpreted, like [[Joint_stereo|joint stereo]], and identified with a filter.
+
==Motivation behind this article==
In explaining what it does, in easy terms and in technical terms, the reader should get a better understanding of the motivation and the usage of such switch.
+
The article tries to clarify what the switch does and what it does not do. It is frequently misinterpreted, like [[Joint_stereo|joint stereo]], and mistaken for a filter.
 +
 
 +
In explaining what it does, in easy terms and in technical terms, the reader should get a better understanding of the motivation behind and the usage of the switch.
  
  
Line 61: Line 68:
  
 
[http://www.hydrogenaudio.org/forums/index.php?showtopic=79841&st=0 Hydrogenaudio thread discussing this article]
 
[http://www.hydrogenaudio.org/forums/index.php?showtopic=79841&st=0 Hydrogenaudio thread discussing this article]
 +
  
 
==Notes and references==
 
==Notes and references==
  
 
In MPEG1 (32, 44, 48Khz), the last scalefactor band is sfb21. In MPEG2 (16,22,24Khz), it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material.
 
In MPEG1 (32, 44, 48Khz), the last scalefactor band is sfb21. In MPEG2 (16,22,24Khz), it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material.
 +
 +
Global gain and scale factors are not independent. The latter is expressed as a difference of the former.
 +
 +
* The global gain is the global quantization step size, with a value range between 0 and 255.
 +
* The scale factor per band is the amount to reduce the global quantization step size. The range of this value is dependant on the band.
 +
Consequently, there are just a reduced amount of values to use.
  
 
This article has been brought up partially with comments fom Aleron Ives, robert and benski.
 
This article has been brought up partially with comments fom Aleron Ives, robert and benski.

Revision as of 10:04, 1 April 2010

This article describes the function of the -Y switch in the LAME encoder commandline.

The short definition

  • The -Y switch tells LAME not to encode the highest frequencies accurately, if doing so causes disproportional increases in bitrate.


Other ways to say it include:

  • The -Y switch tells LAME to use a more coarse representation for the highest frequencies, in the parts where it would cause an over-encoding of all the other bands.
  • The -Y switch tells LAME to not be so strict with the higher frequencies, if they are going to cause an increase of bitrate.


The -Y switch is not a lowpass filter.
It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to simply remove them instead).


The technical definition

How is audio stored in MP3

  • MP3 audio is stored in the frequency domain (values for frequencies) instead of the time domain (values for samples)
  • Frequencies are analyzed and stored in groups, known as bands.
  • Bands are quantized to make them compress better.
  • Scale factor refers to how much quantization (loss of precision) is applied to each band, where higher quantization causes greater compression, and consequently less variation between the minimum and maximum values (resolution).
  • Each band has its own scale factor, so that its quantization can be adjusted independently from the others.
  • The exception is scalefactor band 21 (sfb21), which does not have a scale factor. This band stores frequencies of 16 kHz and above.
  • Global gain is an extra quantizer that affects all bands simultaneously.

(See section notes about scalefactors and global gain)

What is the scalefactor band 21 (sfb21) defect

  • If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scale factor of sfb21 alone, since there is no such scale factor.
  • The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization, since global gain applies to all bands.
  • The encoder can reduce the global gain as long as it is above zero.
  • If global gain is zero, resolution will need to be increased (and quantization lowered) on every other scale factor band.
  • The result is that unnecessary resolution is applied to every other band, so the bits used in all the other bands will increase, causing the bitrate to rise.
  • The encoder is forced to excessively increase the bitrate of the file just so that the frequencies >= 16 kHz will be adequately quantized.

The -Y switch and the sfb21

LAME implements the -Y switch as a way to activate the alternate logic that CBR uses in respect to quantization noise in the sfb21 band.

  • The encoder determines the desired quantization noise within the sfbs. The scale factors are choosen acording to these values.
  • If -Y switch is not used (either implicitly or explicitly), sfb21 gets evaluated and the global gain is set acordingly.
  • Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21.

The result is that all the 16 kHz and above frequencies still get encoded.

The ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch. The Y switch prevents global gain quantization from being decreased solely to accomodate the needs of sfb21.


The -Y switch and CBR/ABR

The -Y switch can only be activated in VBR mode. By default, -V 3 to -V 9 use -Y. -V 0, -V 1, and -V 2 do not. Consequently, adding -Y is only useful for the highest three VBR settings.

This is because in CBR and ABR modes, the encoder uses -Y implicitly. Specifically, LAME targets a given bitrate, and adjusts the quantization steps until that target is reached.

Since the sfb21 does not have quantization, its quantization noise is not evaluated.

This is the same treatment as using -Y in VBR mode.


Motivation behind this article

The article tries to clarify what the switch does and what it does not do. It is frequently misinterpreted, like joint stereo, and mistaken for a filter.

In explaining what it does, in easy terms and in technical terms, the reader should get a better understanding of the motivation behind and the usage of the switch.


See also

Description of the MPEG layer 3 format

Hydrogenaudio thread discussing this article


Notes and references

In MPEG1 (32, 44, 48Khz), the last scalefactor band is sfb21. In MPEG2 (16,22,24Khz), it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material.

Global gain and scale factors are not independent. The latter is expressed as a difference of the former.

  • The global gain is the global quantization step size, with a value range between 0 and 255.
  • The scale factor per band is the amount to reduce the global quantization step size. The range of this value is dependant on the band.

Consequently, there are just a reduced amount of values to use.

This article has been brought up partially with comments fom Aleron Ives, robert and benski.