From Hydrogenaudio Knowledgebase
Jump to: navigation, search

Here's where I keep track of things I want to incorporate into the main article space someday.

Most of this has to do with documenting the various methods of MP3 splitting and gapless playback.

LAME bugs

CBR quality

There's a bug that affects CBR, especially at or near 128 kbps, at default and lower quality levels. One user reported "warbling, popping, clicking sounds". And on lame-dev, I reported that higher frequencies are briefly attenuated just after the beginning of the stream, sometimes audibly affecting gapless playback—see animated GIF here (click). The attenuation I spotted was minor in 3.96 and 3.96.1, but major in 3.97 through 3.98.4. Robert Hegemann thinks the bug is related to preset tuning and that it may have appeared as early as LAME 3.94 beta, but I did a quick test and found that it seems to have first manifested in LAME 3.96, at least for default settings (lame infile.wav with no command-line options). The issue was apparently fixed in LAME 3.99a13 by making CBR use the newer VBR psychoacoustic model and quantization code.

nogaptags writes wrong delay & padding values

Incorrect encoder delay and padding values are written to the LAME tags in the VBR header frame of each file created with --nogaptags --nogap. Although Gabriel Bouvigne has been vocal about ditching --nogap functionality altogether, Robert Hegemann says he hasn't made up his mind yet and that I should go ahead and file a bug report, which I did (see the details there):

LAME questions

Possible nogap quality issue

As I wrote on lame-dev on 5 Apr 2011:

One of Gabriel's complaints was that with --nogap there's "a potential quality decrease at the beginning and at the end" of each file[1] and he claims that using delay & padding values from the LAME tag "is way more elegant, and at least it does not reduce quality of individual tracks (unlike --nogap)."[2]

The current documentation describes --nogap thusly:

Encodes multiple files (ordered by position) which are meant to be played gaplessly. By default, LAME will encode the files with accurate length, but the first and last frame may contain a few erroneous samples for signals that don't fade-in/out (as is the case of continuous playback). This setting solves that by using the samples from the next/previous file to compute the encoding.

This is an incomplete description of the functionality, but seems correct.

Gabriel's comments, however, seem to contradict the documentation [he says nogap has quality issues at the boundaries, the docs say nogap fixes quality issues at the boundaries].

I was unable to find any explanation of the quality problems he was referring to. He seems to be saying the problem is with the way the files are encoded, but that it manifests only when playing them back separately.

Is it related to the bit reservoir? (Is the reservoir flushed by the last frame?) Or is it really just a decoding issue? — Like, the decoder doesn't have access to frames from adjacent files when playing the files separately, so even though good data is encoded, you get some different samples in the output than you would get when decoding the same portion of the concatenated input?

That kind of problem wouldn't be unique to nogap files, though, so it wouldn't be solved by using delay & padding values from the LAME tag instead of --nogap. So I just don't understand what he's talking about.

Boundary issue research

First 32 samples

The LAME tech FAQ makes several claims related to erroneous samples at the beginning of a stream:

  • The first 96 samples will be attenuated due to MDCT overlap. (288 samples prior to LAME 3.56; or 240 prior to 3.56beta, according to the changelog. Which is correct?)
  • The first 576 samples will have psychoacoustic errors.
  • Encoder delay of 576 makes these first two issues moot!
  • The first 608 samples will be corrupt due to polyphase filterbank interaction with the 96 attenuated samples.

I deduce from this that only the first 32 samples (that's 608 minus 576) are really at risk. However I can't tell in my testing if there really is any problem with the first 32 samples. If they're bad, they're not majorly so. I have no idea how to see.

However, I made a set of .wav files that alternate between simple, full-scale sine and triangle waves, and use --nogap to encode them. The result of separately decoding them with --decode --decode-mp3delay -529 revealed slight frequency and amplitude problems at the beginning of each file, after the first one. Since those files (after the first) have no encoder delay, it seems to confirm that psychoacoustic errors exist in this region.

This leads me to think that ideal gapless playback of this type of file, if there's no way to avoid resetting the decoder between files, requires not just getting the last samples from the file that's ending (which may be tricky, since it has no padding), but also doing a crossfade, fading out the file that's ending while fading in the file that's beginning, so that when 1st sample after the decoder delay is played, it's not too extremely different from the last sample played from the previous file.

Samples at end of truncated stream

The tech FAQ kind of glosses over what happens at the end of a truncated stream, but from what I've read, it's roughly the same problem as when the truncation is at the beginning: there's nothing to overlap with but silence/padding, so samples near the end of a split/"nogap" file are likely to be a little different than their counterparts in the encode/decode of the whole set.

My own testing, comparing separately-decoded "nogap" files to a pre-joined decode of the same files, confirms that there are indeed inaudible sample differences that start about 3 granules'-worth of samples before the first seam, as I expected. And the samples never match after that point, as can be expected with the decoder being reset between files as part of the test.

There's also a big problem, alluded to above, in that decoder behavior is ambiguous when dealing with streams which have no padding. A decoder (of a typical MP3 file) will output a block of 1152 samples per frame of input. But the decoder delay of 529 samples causes this output to be offset relative to what's actually in the frames; the first 529 samples output are junk, and the rest of the first block is the first 523 samples from the first frame. The last 529 samples of the last frame are typically lost, which is why there's usually at least that much padding added by the encoder.

So what do you do if you are a decoder front-end and you know that this is a file whose padding is smaller than the decoder delay? For gapless playback, if you can't avoid resetting the decoder between files, you have to output those last 529 samples. Do decoders always provide these samples to the front-end, or do they trim (or not decode) them before the front-end sees them? If they don't provide access to them, then it seems you'd have to feed the decoder an extra frame of padding and then trim the excess output.

foobar2000 matches the behavior of lame --decode (unless --decode-mp3delay is used), trimming exactly (never less than) 529 samples whenever the encoder delay declared in a LAME tag to be 529 or less. I feel this is incorrect behavior, but fb2k forum gurus kode54 and Yirkha seem to feel it's correct. Regardless, this means that gapless playback is only possible in foobar2000 when using files with encoder delay & padding declared in the LAME tag, and the padding must be at least 529. So it's fine if used with ordinary LAME encoded files, but can't be used with split/nogap files.

On the other hand, on lame-dev, mpg123 maintainer Thomas Orgis explained to me how mpg123 handles this situation. He said it does exactly what I feel is correct: it gets all the samples from the decoder, and if the padding is declared to be less than the encoder delay, it trims only the appropriate amount (529 minus the padding, I think).

Is any of this covered in the MPEG specs? I have the specs but can't make heads or tails of them. I'd like to know if what I'm advocating is noncompliant.

foobar2000 bug

In addition to the issue mentioned above...

Bug: Headerless CBR MP3 duration estimate off by 529 samples

Description: foobar2000 properly strips decoder delay (529 samples) from MP3s during playback, but for certain CBR files, foobar2000 doesn't strip these samples from the duration estimate as shown in the properties window. The affected files are CBR files without a VBR header, or, I assume, those with a VBR header that contains no duration info. Where this is most easily noticed is in the file integrity verifier, which compares the estimated duration to the actual playback duration. For these CBR files, the comparison is always unequal (off by 529 samples), so they're misreported as problematic.

Proposed solution: I believe if fb2k were to simply reduce the estimated duration of this class of CBR files by 529 samples, the problem would be solved.