LAME Y switch: Difference between revisions
m (moved LAME Y SWITCH to LAME Y Switch: capitalization) |
Beardgoggles (talk | contribs) m (Minor page reformatting.) |
||
(3 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
This article describes the function of '''the -Y switch''' in the [[LAME]] encoder | This article describes the function of '''the -Y switch''' in the [[LAME]] command-line encoder, and tries to clarify what the switch does and what it does not do. It is frequently misinterpreted, like [[joint stereo]], and mistaken for a filter.<ref>{{ha|url=https://hydrogenaud.io/index.php?topic=79841.msg697111#msg697111|title=-V2 gives way too high bitrate!?!}}</ref> | ||
In explaining what it does, in easy terms and in technical terms, the reader should get a better understanding of the motivation behind and the usage of the switch. | |||
==The short definition== | ==The short definition== | ||
* The -Y switch tells | * The -Y switch tells LAME not to encode the highest frequencies '''accurately''', '''if doing so''' causes disproportional increases in [[bitrate]]. | ||
Other ways to say it include: | Other ways to say it include: | ||
* The -Y switch tells | * The -Y switch tells LAME to use a more '''coarse representation''' for the highest frequencies, in the parts where it would cause an over-encoding of all the other bands. | ||
* The -Y switch tells | * The -Y switch tells LAME to '''not be so strict''' with the higher frequencies, '''if''' they are going to cause an increase of bitrate. | ||
; The -Y switch is not a lowpass filter. | ; The -Y switch is not a lowpass filter. | ||
: It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to simply remove them instead). | : It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to simply remove them instead). | ||
==The technical definition== | ==The technical definition== | ||
===How is audio stored in MP3=== | ===How is audio stored in MP3=== | ||
* MP3 audio is stored in the frequency domain (values for frequencies) instead of the time domain (values for samples) | * MP3 audio is stored in the frequency domain (values for frequencies) instead of the time domain (values for samples) | ||
Line 29: | Line 27: | ||
* If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scale factor of sfb21 alone, since there is no such scale factor. | * If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scale factor of sfb21 alone, since there is no such scale factor. | ||
* The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization, since global gain applies to all bands. | * The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization, since global gain applies to all bands. | ||
* Scale factors are stored as a relative value of global gain (just the difference is stored). Let's call this value the relative factor. | * Scale factors are stored as a relative value of global gain (just the difference is stored). Let's call this value the "relative factor". | ||
* To balance the reduction of global gain, the scale factor of the other bands should increase. Consequently, the relative factor decreases. | * To balance the reduction of global gain, the scale factor of the other bands should increase. Consequently, the relative factor decreases. | ||
* The relative factor can be decreased until it reaches zero. At that point, any further reduction of global gain implies that the band will use more resolution than needed. | * The relative factor can be decreased until it reaches zero. At that point, any further reduction of global gain implies that the band will use more resolution than needed. | ||
* The encoder is forced to increase the bitrate needed, not only because of the frequencies at or above 16Khz, but also because other bands below the sfb21 have excessive resolution. | * The encoder is forced to increase the bitrate needed, not only because of the frequencies at or above 16Khz, but also because other bands below the sfb21 have excessive resolution. | ||
=== | '''Source:''' [http://www.mp3-tech.org/content/?Mp3%20Limitations MP3' Tech - Mp3 Limitations] {{webarchive|https://web.archive.org/web/20120222124415/http://www.mp3-tech.org/content/?Mp3%20Limitations|2012-02-22}} | ||
===sfb21 and the -Y switch=== | |||
LAME implements the -Y switch as a way to activate the alternate logic that [[CBR]] uses in respect to quantization noise in the sfb21 band. | |||
* The encoder determines the desired quantization noise within the sfbs. The scale factors are choosen acording to these values. | * The encoder determines the desired quantization noise within the sfbs. The scale factors are choosen acording to these values. | ||
* If -Y switch is not used (either implicitly or explicitly), sfb21 gets evaluated and the global gain is set | * If -Y switch is not used (either implicitly or explicitly), sfb21 gets evaluated and the global gain is set accordingly. | ||
* Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21. | * Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21. | ||
Line 44: | Line 44: | ||
The ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch. '''The Y switch prevents global gain quantization from being decreased solely to accomodate the needs of sfb21'''. | The ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch. '''The Y switch prevents global gain quantization from being decreased solely to accomodate the needs of sfb21'''. | ||
==CBR/ABR and the -Y switch== | |||
The -Y switch can only be activated in [[LAME#Technical information|VBR mode]]. By default, {{code|-V3}}–{{code|-V9}} use -Y. {{code|-V0}}–{{code|-V2}} do not. Consequently, adding -Y is only useful for the highest three VBR settings. | |||
This is because in CBR and [[ABR]] modes, the encoder uses -Y implicitly. Specifically, LAME targets a given bitrate, and adjusts the quantization steps until that target is reached. | |||
This is because in CBR and ABR modes, the encoder uses -Y implicitly. | |||
Specifically, LAME targets a given bitrate, and adjusts the quantization steps until that target is reached. | |||
Since the sfb21 does not have quantization, its quantization noise is not evaluated. | Since the sfb21 does not have quantization, its quantization noise is not evaluated. | ||
Line 55: | Line 53: | ||
This is the same treatment as using -Y in VBR mode. | This is the same treatment as using -Y in VBR mode. | ||
==Notes== | |||
==Notes | |||
For long blocks, the last scalefactor band is sfb21. For short blocks it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material. | For long blocks, the last scalefactor band is sfb21. For short blocks it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material. | ||
Line 81: | Line 63: | ||
This article has been brought up partially with comments fom Aleron Ives, robert and benski. | This article has been brought up partially with comments fom Aleron Ives, robert and benski. | ||
==See also== | |||
* [[MP3|Description of the MPEG Audio Layer III format]] | |||
==References== | |||
<references/> |
Latest revision as of 17:24, 3 December 2019
This article describes the function of the -Y switch in the LAME command-line encoder, and tries to clarify what the switch does and what it does not do. It is frequently misinterpreted, like joint stereo, and mistaken for a filter.[1]
In explaining what it does, in easy terms and in technical terms, the reader should get a better understanding of the motivation behind and the usage of the switch.
The short definition
- The -Y switch tells LAME not to encode the highest frequencies accurately, if doing so causes disproportional increases in bitrate.
Other ways to say it include:
- The -Y switch tells LAME to use a more coarse representation for the highest frequencies, in the parts where it would cause an over-encoding of all the other bands.
- The -Y switch tells LAME to not be so strict with the higher frequencies, if they are going to cause an increase of bitrate.
- The -Y switch is not a lowpass filter.
- It allows high frequencies (>=16Khz) to exist, it just alters its accuracy. If their values are very small it can quantize them to zero (but probably the psychoacoustic analyzer will decide to simply remove them instead).
The technical definition
How is audio stored in MP3
- MP3 audio is stored in the frequency domain (values for frequencies) instead of the time domain (values for samples)
- Frequencies are analyzed and stored in groups, known as bands.
- Bands are quantized to make them compress better.
- Scale factor refers to how much quantization (loss of precision) is applied to each band, where higher quantization causes greater compression, and consequently less variation between the minimum and maximum values (resolution).
- Each band has its own scale factor, so that its quantization can be adjusted independently from the others.
- The exception is scalefactor band 21 (sfb21), which does not have a scale factor. This band stores frequencies of 16 kHz and above.
- Global gain is an extra quantizer that affects all bands simultaneously.
(See section notes about scalefactors and global gain)
What is the scalefactor band 21 (sfb21) defect
- If the encoder determines that sfb21 needs more resolution, it has no way to decrease the scale factor of sfb21 alone, since there is no such scale factor.
- The only way to increase the resolution on sfb21 is therefore to reduce the global gain quantization, since global gain applies to all bands.
- Scale factors are stored as a relative value of global gain (just the difference is stored). Let's call this value the "relative factor".
- To balance the reduction of global gain, the scale factor of the other bands should increase. Consequently, the relative factor decreases.
- The relative factor can be decreased until it reaches zero. At that point, any further reduction of global gain implies that the band will use more resolution than needed.
- The encoder is forced to increase the bitrate needed, not only because of the frequencies at or above 16Khz, but also because other bands below the sfb21 have excessive resolution.
Source: MP3' Tech - Mp3 Limitations (archived on February 22, 2012)
sfb21 and the -Y switch
LAME implements the -Y switch as a way to activate the alternate logic that CBR uses in respect to quantization noise in the sfb21 band.
- The encoder determines the desired quantization noise within the sfbs. The scale factors are choosen acording to these values.
- If -Y switch is not used (either implicitly or explicitly), sfb21 gets evaluated and the global gain is set accordingly.
- Adding -Y lets the encoder ignore whatever quantization noise will be in sfb21.
The result is that all the 16 kHz and above frequencies still get encoded.
The ones that would normally have needed higher resolution to satisfy the criteria of the psy-model don't receive that treatment, while ones that wouldn't need higher resolution are unaffected by the Y switch. The Y switch prevents global gain quantization from being decreased solely to accomodate the needs of sfb21.
CBR/ABR and the -Y switch
The -Y switch can only be activated in VBR mode. By default, -V3
–-V9
use -Y. -V0
–-V2
do not. Consequently, adding -Y is only useful for the highest three VBR settings.
This is because in CBR and ABR modes, the encoder uses -Y implicitly. Specifically, LAME targets a given bitrate, and adjusts the quantization steps until that target is reached.
Since the sfb21 does not have quantization, its quantization noise is not evaluated.
This is the same treatment as using -Y in VBR mode.
Notes
For long blocks, the last scalefactor band is sfb21. For short blocks it is sfb12. The frequency at which it starts also depends on the sampling rate. The value of ~16Khz is for 44.1Khz material.
Global gain and scale factors are not independent when stored to the file. The latter is expressed as a difference of the former. (relative factor)
- The global gain is the global quantization step size, with a value range between 0 and 255.
- The relative factor per band is the amount to reduce the global quantization step size. The range of this value is dependant on the band.
Consequently, there are just a reduced amount of values to use and a change to global gain needs to be compensated by a change in the relative factor.
This article has been brought up partially with comments fom Aleron Ives, robert and benski.
See also
References
- ↑ -V2 gives way too high bitrate!?! on hydrogenaudio