Gapless playback: Difference between revisions

From Hydrogenaudio Knowledgebase
(https://hydrogenaud.io/index.php/topic,104585.0.html)
 
(21 intermediate revisions by 7 users not shown)
Line 1: Line 1:
'''Gapless playback''' is the seamless playback of digital audio formats. It allows live music or consecutive tracks to be heard exactly as they are mastered, without gaps between tracks.
'''Gapless playback''' is the seamless playback of sequential audio tracks in digital audio formats. It allows live music or consecutive tracks to be heard exactly as they are mastered, without gaps between tracks.


==Why gaps occur==
==Example==
Most lossy audio compression schemes involve a time/frequency domain transform.  Such transforms cannot deal with arbitrary amounts of data, and instead act on blocks of data at a time.  In order for the audio signal to be encoded in its entirety, small amounts of silence are prepended and appended to the input before the transform. If the amount of padded silence is not accounted for, the playtime of the audio data may not be equal before and after the compression. In such cases, the silence will be decoded together with the audio data, introducing gaps between tracks.
The typical situation is this:
# Someone uses [[DAE]] software to extract audio data from an audio CD and save it to separate files, one for each track.
# During or after the creation of the files, the audio data is compressed with a lossy codec like MP3.
# Upon playback, a click or a brief (fraction-of-a-second) pause can be heard between tracks.


Most audio playback software will also close the audio output stream when switching tracks, introducing gaps or making existing gaps larger.  Unless the beginning of the next track is buffered and introduced immediately when the current track ends, gaps will occur.
Even if two lossily-compressed tracks are decompressed and merged into a single track, a gap will usually remain between them.


Some compression methods such as the popular [[MP3]] can be problematic because the MP3 standard defines no way to record the amount of padding for later removal.  Even if two tracks are decompressed and merged into a single track, a gap will remain between them.  More recent audio formats have been designed to address this problem, and will produce gapless audio if played back correctly.
== Why gaps occur ==
There are two main reasons why gaps occur during playback: compression scheme artifacts and poorly designed playback systems.


==Optimal solution==
=== Compression scheme artifacts ===
Most lossy audio compression schemes add a small amount of silence to both ends of the audio. Due to the introduction of such gaps, the duration of the output is slightly increased. Silence at the beginning is called '''delay''' and silence at the end is '''padding'''. Delay can be called '''encoder delay''' or '''decoder delay''' depending on what part of the compression scheme introduces the delay. Padding normally is only added by the encoder. [[#Common encoder delay values|Common encoder delay values]] are in a table below.
 
Many compression schemes involve a time/frequency domain transform (such as an [[MDCT]]) which unavoidably introduces a certain amount of silence (part of the encoder delay) at the beginning of the stream. This gap can be enlarged at decode time when a reverse-MDCT is performed, because the reverse transform will also introduce a gap (decoder delay) of its own.
 
Another part of the encoder delay and padding is related to the overlapping nature of MDCT transforms; each segment of the encoded audio depends in part on adjacent segments. Therefore, a little bit of extra signal is required at both ends of the input in order to fully and accurately encode the frequencies found in the original ends. The encoder may add a consistent number of silent samples to one or both ends to achieve this.
 
Yet another factor is the fact that transforms act on data in units of fixed-size blocks. In order to fill up the last block, silence may be appended to the input before the transform. This makes the overall padding amount hard to predict, if the length of the input isn't known.
 
If the amount of encoder delay and padding are not all accurately accounted for, the encoded silence will be decoded together with the audio data, creating gaps at the ends of the track. Likewise, if the decoder delay is not accounted for, the gap at the end will be further enlarged.
 
This issue is technical but also standards-related.  The popular [[MP3]] standard, for example, defines no way to record the amount of delay or padding for later removal. Encoder delay may vary from encoder to encoder, making automatic removal difficult. Some encoders use [[MP3#VBRI, XING, and LAME headers|a nonstandard header]] to store actual encoder delay & padding values, but not all players/decoders support it. More recent (newer than MP3) compressed audio formats have been designed to address this problem, and can therefore produce gapless audio if played back correctly.
 
=== Poorly designed playback systems ===
Even when the audio file itself does not contain undesirable gaps, software/firmware/hardware design often adds gaps during playback. In some cases, software closes and re-opens the output stream when switching tracks, causing the hardware to create a very short "click". This problem is solved in more sophisticated designs of gapless playback.
 
A different design problem relates to software/firmware/hardware which are not ready to seamlessly move to the next track by the time the current track is complete. In this scenario, the listener is left waiting in silence as the player locates the next file, reads it, decodes the first blocks if necessary and then starts loading the buffer for playback. The gap can be as much as half a second, or even more — very noticeable in "continuous" music such as certain classical or dance genres.
 
Many older audio players on personal computers do not implement the required buffering to play gapless audio.  Some of these rely on third-party gapless audio plug-ins to buffer output.  Some newer players and newer versions of old players now support gapless playback directly.
 
==CD gaps==
Gapless playback, as discussed here, is ''not'' related to "gaps" or "pregaps" on CDs, at least not directly. Those kinds of gaps are just sections of audio which have been designated on the CD as being the "index 00" portion of each track. They are the audio (often silent or nearly so, but not always) which plays while the CD player counts up from a negative time to 0:00. Some DAE software offers the option of detecting and omitting these sections when ripping. Although this may seem to be a method of creating "gapless" audio, it's merely removing, with only 588-sample (1/75th of a second) precision, chunks of sound based on flags embedded in the CD, not based on how silent the audio actually is. This kind of feature is useful when ripping a CD-R which had been mistakenly burned with 2-second gaps between each track, but when used on a properly mastered commercial CD, it's more likely to just create audible seams where there were none before.
 
==Testing for gapless==
The best way to test for gapless playback is by using ''Test Samples'' listed at the end of this page.
 
It's tempting to test gapless playback by generating tracks with pure tones, and encoding them into a lossy format. This is not recommended for two reasons:
 
# Unless the first tone ends at 0 level and the second tone starts at 0 level, a glitch will be heard during transition.
# Some decoders chop off the end and/or the start of the audio data. So the playback will be perceived as gapless, while it is actually not.
 
See the discussion on [http://www.hydrogenaudio.org/forums/index.php?showtopic=40995&st=0&p=360548&#entry360548 this HA thread].
 
== Optimal solution ==
It is possible to store metadata in the audio to explicitly declare the playtime, and/or the amount of padding/delays introduced in the encoding process. This information can be used to ensure that playtime will remain constant after decoding with no added silence.  The audio playback software must be able to recognize the metadata, and trim the decoded audio as necessary.
It is possible to store metadata in the audio to explicitly declare the playtime, and/or the amount of padding/delays introduced in the encoding process. This information can be used to ensure that playtime will remain constant after decoding with no added silence.  The audio playback software must be able to recognize the metadata, and trim the decoded audio as necessary.


The software can then take care to keep the output stream open between tracks.   It must also buffer the beginning of the following track in the same way it buffers the current track during normal playback.
The software can then take care to keep the output stream open between tracks. It must also buffer the beginning of the following track in the same way it buffers the current track during normal playback.
 
Optimal gapless audio is achieved if
* the compression method supports gapless playback,
* the software properly decodes the audio data and metadata,
* the next track is buffered and ready to play, and
* the output stream remains open between tracks.
 
A collection of consecutive tracks will then play in the same way they were mastered, allowing the listener to hear their album as the author intended.
 
== How to add or repair gap metadata in MP3s ==
You will need a wave editor with a proper selection zoom feature (to count samples) and foobar2000 to edit the delay/padding info in the mp3 files.


If the compression method supports gapless playback, the software properly decodes the audio data and metadata, the next track is buffered and ready to play, and the output stream remains open between tracks, optimal gapless audio is achieved.  A collection of consecutive tracks will then play in the same way they were mastered, allowing the listener to hear their album as the author intended.
# Decode all files to uncompressed wav format with foobar2000 (this is important, because if you open the mp3 directly with the wave editor its internal decoder may add extra silence at the beginning and/or at the end). You need to convert separate tracks to separate wav files, not to a single wav file.
# Determine the encoder delay (number of silent samples at the beginning of each track). This number should be identical for all tracks if they were encoded with the same software. For example, for tracks encoded with [[Gogo]], since it's [[LAME]]-based, the delay should be 576 samples, but it wouldn't hurt to check visually in the editor.
# For each track that should end with gapless transition, determine the number of silent samples at the end (the padding). Unfortunately this procedure can not be reliably automated, because silent samples may not strictly equal zero (due to the encoder filter ringing). You'll need to visually search for the waveform cut-off, which probably won't be abrupt.
# Determine the true length of the track in samples: true length = (raw length) - (encoder delay) - (padding). If the track came from an audio CD, then this number should be a multiple of 588 (the audio CD sector size). If it is not, then either you incorrectly determined the number of padding samples, or the content was not encoded from an audio CD, or possibly it was digitally processed prior to mp3 encoding.
# Edit the mp3 info frame with foobar2000 (local menu -> utilities -> edit mp3 gapless playback information). Enter the encoder delay from step 2 into the "encoder delay" field and the true track length from step 4 into the "padding" field. Save the changes.
 
=== Common encoder delay values ===
{|class="wikitable" border=1
!encoder
!delay
|-
|align="right"|'''LAME'''
|align="center"|576 samples
|-
|align="right"|'''iTunes/QuickTime''' MP3
|align="center"|528 samples
|-
|align="right"|'''Windows Media Player/FhG FastEnc'''
|align="center"|672 samples
|}
Additional values for older encoders are given at [http://mp3decoders.mp3-tech.org/decoders_lame.html lame v3.81 and 3.87 beta mp3 decoding quality test results].


==Alternative solutions==
==Alternative solutions==
Line 24: Line 92:
Both of these alternate solutions are typically used to address compression methods that do not support the metadata for gapless playback.  Like the optimal solution, they still require buffering and not closing the output stream; however, they require more computations, making them less efficient.  In portable digital audio players, this can mean a reduced playing time on batteries.
Both of these alternate solutions are typically used to address compression methods that do not support the metadata for gapless playback.  Like the optimal solution, they still require buffering and not closing the output stream; however, they require more computations, making them less efficient.  In portable digital audio players, this can mean a reduced playing time on batteries.


Due to the drawbacks of the alternative solutions above, some listeners dislike their negative effects more than the gap they attempt to remove.
Due to the drawbacks of the alternative solutions above, some listeners dislike their negative effects more than the gap they attempt to remove. Another problem is that the solutions above do nothing to prevent the output stream from being closed and reopened at track boundaries; some measures can be taken to simulate a gapless output stream, but they are not always successful and side-effects may occur.


Another alternative is to ignore track boundaries, encoding a single collection of tracks as a single compressed file, relying on [[cuesheet]]s (or something similar) for navigation.  While this method results in gapless playback within the collection of tracks, it can be unwieldy due to the possibly large size of the resulting compressed file. Furthermore, unless the playback software or hardware can recognize the cue sheets, navigating between tracks may be difficult.
Another alternative is to ignore track boundaries, encoding a single collection of tracks as a single compressed file, relying on [[cuesheet]]s (or something similar) for navigation.  While this method results in gapless playback within the collection of tracks with consecutive playback, it can be unwieldy due to the possibly large size of the resulting compressed file. Furthermore, unless the playback software or hardware can recognize the cue sheets, navigating between tracks may be difficult.


==Format support==
== Format support ==
Since [[lossless]] data compression excludes the possibility of the introduction of padding, all lossless audio file formats are inherently gapless. The following [[lossy]] audio file formats have provisions for gapless encoding.
Since [[lossless]] data compression excludes the possibility of the introduction of padding, all lossless audio file formats are inherently gapless. The following [[lossy]] audio file formats have provisions for gapless encoding.


*[[Ogg Vorbis]]
* (Ogg) [[Vorbis]]
*[[Speex]]
* [[Opus]]
* [[Speex]]


Some other formats do not officially support gapless encoding, but some implementations of encoders or decoders may handle gapless metadata.
Some other formats do not officially support gapless encoding, but some implementations of encoders or decoders may handle gapless metadata.


*[[LAME]]-encoded MP3 can be gapless with players that support the LAME Mp3 info tag.
* [[LAME]]-encoded MP3 can be gapless with players that support the LAME Mp3 info tag.
*[[AAC]] in [[MP4]] encoded with Nero Digital from [[Nero AG]] can be gapless with [[foobar2000]].
* [[AAC]] in [[MP4]] encoded with Nero Digital from [[Nero AG]] can be gapless with [[foobar2000]].
* AAC in MP4 encoded with iTunes 7.0 can be gapless with iTunes 7.0 and latest foobar2000.


==Gapless solutions==
== Gapless solutions ==
* [[foobar2000]]: Optimal gapless playback
* [[foobar2000]]: Optimal gapless playback
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=18530 Otachan's in_mpg123]: A gapless [[MP3]] decoder for [[Winamp]]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=18530 Otachan's in_!mpg123]: A gapless [[MP3]] decoder for [[Winamp]]
:'''Note:''' Winamp 5.2 implements gapless playback in its built-in MP3 decoder.
 
== Test samples ==
* [http://tiffman.com/gtkpod/gapless_WAVPACK_free_of_right.zip]: 17 very short gapless samples (2 MB)
 
== External links ==
* [http://yabb.jriver.com/interact/index.php?topic=47033.msg322589#msg322589 some info about how iTunes/QuickTime writes and handles gapless playback info]
 


==Test samples==
[[Category:Technical]]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=40998]: Samples for testing gapless playback abilities
* [http://guruboolez.free.fr/samples/gapless/gapless_WAVPACK_free_of_right.zip]: 17 very short gapless samples (2 MB)

Latest revision as of 20:39, 21 November 2016

Gapless playback is the seamless playback of sequential audio tracks in digital audio formats. It allows live music or consecutive tracks to be heard exactly as they are mastered, without gaps between tracks.

Example

The typical situation is this:

  1. Someone uses DAE software to extract audio data from an audio CD and save it to separate files, one for each track.
  2. During or after the creation of the files, the audio data is compressed with a lossy codec like MP3.
  3. Upon playback, a click or a brief (fraction-of-a-second) pause can be heard between tracks.

Even if two lossily-compressed tracks are decompressed and merged into a single track, a gap will usually remain between them.

Why gaps occur

There are two main reasons why gaps occur during playback: compression scheme artifacts and poorly designed playback systems.

Compression scheme artifacts

Most lossy audio compression schemes add a small amount of silence to both ends of the audio. Due to the introduction of such gaps, the duration of the output is slightly increased. Silence at the beginning is called delay and silence at the end is padding. Delay can be called encoder delay or decoder delay depending on what part of the compression scheme introduces the delay. Padding normally is only added by the encoder. Common encoder delay values are in a table below.

Many compression schemes involve a time/frequency domain transform (such as an MDCT) which unavoidably introduces a certain amount of silence (part of the encoder delay) at the beginning of the stream. This gap can be enlarged at decode time when a reverse-MDCT is performed, because the reverse transform will also introduce a gap (decoder delay) of its own.

Another part of the encoder delay and padding is related to the overlapping nature of MDCT transforms; each segment of the encoded audio depends in part on adjacent segments. Therefore, a little bit of extra signal is required at both ends of the input in order to fully and accurately encode the frequencies found in the original ends. The encoder may add a consistent number of silent samples to one or both ends to achieve this.

Yet another factor is the fact that transforms act on data in units of fixed-size blocks. In order to fill up the last block, silence may be appended to the input before the transform. This makes the overall padding amount hard to predict, if the length of the input isn't known.

If the amount of encoder delay and padding are not all accurately accounted for, the encoded silence will be decoded together with the audio data, creating gaps at the ends of the track. Likewise, if the decoder delay is not accounted for, the gap at the end will be further enlarged.

This issue is technical but also standards-related. The popular MP3 standard, for example, defines no way to record the amount of delay or padding for later removal. Encoder delay may vary from encoder to encoder, making automatic removal difficult. Some encoders use a nonstandard header to store actual encoder delay & padding values, but not all players/decoders support it. More recent (newer than MP3) compressed audio formats have been designed to address this problem, and can therefore produce gapless audio if played back correctly.

Poorly designed playback systems

Even when the audio file itself does not contain undesirable gaps, software/firmware/hardware design often adds gaps during playback. In some cases, software closes and re-opens the output stream when switching tracks, causing the hardware to create a very short "click". This problem is solved in more sophisticated designs of gapless playback.

A different design problem relates to software/firmware/hardware which are not ready to seamlessly move to the next track by the time the current track is complete. In this scenario, the listener is left waiting in silence as the player locates the next file, reads it, decodes the first blocks if necessary and then starts loading the buffer for playback. The gap can be as much as half a second, or even more — very noticeable in "continuous" music such as certain classical or dance genres.

Many older audio players on personal computers do not implement the required buffering to play gapless audio. Some of these rely on third-party gapless audio plug-ins to buffer output. Some newer players and newer versions of old players now support gapless playback directly.

CD gaps

Gapless playback, as discussed here, is not related to "gaps" or "pregaps" on CDs, at least not directly. Those kinds of gaps are just sections of audio which have been designated on the CD as being the "index 00" portion of each track. They are the audio (often silent or nearly so, but not always) which plays while the CD player counts up from a negative time to 0:00. Some DAE software offers the option of detecting and omitting these sections when ripping. Although this may seem to be a method of creating "gapless" audio, it's merely removing, with only 588-sample (1/75th of a second) precision, chunks of sound based on flags embedded in the CD, not based on how silent the audio actually is. This kind of feature is useful when ripping a CD-R which had been mistakenly burned with 2-second gaps between each track, but when used on a properly mastered commercial CD, it's more likely to just create audible seams where there were none before.

Testing for gapless

The best way to test for gapless playback is by using Test Samples listed at the end of this page.

It's tempting to test gapless playback by generating tracks with pure tones, and encoding them into a lossy format. This is not recommended for two reasons:

  1. Unless the first tone ends at 0 level and the second tone starts at 0 level, a glitch will be heard during transition.
  2. Some decoders chop off the end and/or the start of the audio data. So the playback will be perceived as gapless, while it is actually not.

See the discussion on this HA thread.

Optimal solution

It is possible to store metadata in the audio to explicitly declare the playtime, and/or the amount of padding/delays introduced in the encoding process. This information can be used to ensure that playtime will remain constant after decoding with no added silence. The audio playback software must be able to recognize the metadata, and trim the decoded audio as necessary.

The software can then take care to keep the output stream open between tracks. It must also buffer the beginning of the following track in the same way it buffers the current track during normal playback.

Optimal gapless audio is achieved if

  • the compression method supports gapless playback,
  • the software properly decodes the audio data and metadata,
  • the next track is buffered and ready to play, and
  • the output stream remains open between tracks.

A collection of consecutive tracks will then play in the same way they were mastered, allowing the listener to hear their album as the author intended.

How to add or repair gap metadata in MP3s

You will need a wave editor with a proper selection zoom feature (to count samples) and foobar2000 to edit the delay/padding info in the mp3 files.

  1. Decode all files to uncompressed wav format with foobar2000 (this is important, because if you open the mp3 directly with the wave editor its internal decoder may add extra silence at the beginning and/or at the end). You need to convert separate tracks to separate wav files, not to a single wav file.
  2. Determine the encoder delay (number of silent samples at the beginning of each track). This number should be identical for all tracks if they were encoded with the same software. For example, for tracks encoded with Gogo, since it's LAME-based, the delay should be 576 samples, but it wouldn't hurt to check visually in the editor.
  3. For each track that should end with gapless transition, determine the number of silent samples at the end (the padding). Unfortunately this procedure can not be reliably automated, because silent samples may not strictly equal zero (due to the encoder filter ringing). You'll need to visually search for the waveform cut-off, which probably won't be abrupt.
  4. Determine the true length of the track in samples: true length = (raw length) - (encoder delay) - (padding). If the track came from an audio CD, then this number should be a multiple of 588 (the audio CD sector size). If it is not, then either you incorrectly determined the number of padding samples, or the content was not encoded from an audio CD, or possibly it was digitally processed prior to mp3 encoding.
  5. Edit the mp3 info frame with foobar2000 (local menu -> utilities -> edit mp3 gapless playback information). Enter the encoder delay from step 2 into the "encoder delay" field and the true track length from step 4 into the "padding" field. Save the changes.

Common encoder delay values

encoder delay
LAME 576 samples
iTunes/QuickTime MP3 528 samples
Windows Media Player/FhG FastEnc 672 samples

Additional values for older encoders are given at lame v3.81 and 3.87 beta mp3 decoding quality test results.

Alternative solutions

Digital signal processor (DSP) plugins can be used to detect silence between tracks and trim the audio as necessary on playback. This is not an optimal solution because it does not always produce results identical to the source. Sometimes an artist may intentionally leave silence at track boundaries for dramatic effect; removing this silence also removes that effect.

It can also be difficult to properly implement silence removal. If the silence threshold is too low and the track contains decoder artifacts, the software may not recognise some silences. Conversely, if the threshold is too high, the software may remove entire sections of quiet music at the beginning or end of a track.

DSP plugins can also be used to cross-fade between tracks. This eliminates gaps that some listeners find distracting, but also greatly alters the audio data and is not always desirable. In particular, when tracks are meant to be played together and perform the transition at high volume, cross-fading results in a large volume drop.

Both of these alternate solutions are typically used to address compression methods that do not support the metadata for gapless playback. Like the optimal solution, they still require buffering and not closing the output stream; however, they require more computations, making them less efficient. In portable digital audio players, this can mean a reduced playing time on batteries.

Due to the drawbacks of the alternative solutions above, some listeners dislike their negative effects more than the gap they attempt to remove. Another problem is that the solutions above do nothing to prevent the output stream from being closed and reopened at track boundaries; some measures can be taken to simulate a gapless output stream, but they are not always successful and side-effects may occur.

Another alternative is to ignore track boundaries, encoding a single collection of tracks as a single compressed file, relying on cuesheets (or something similar) for navigation. While this method results in gapless playback within the collection of tracks with consecutive playback, it can be unwieldy due to the possibly large size of the resulting compressed file. Furthermore, unless the playback software or hardware can recognize the cue sheets, navigating between tracks may be difficult.

Format support

Since lossless data compression excludes the possibility of the introduction of padding, all lossless audio file formats are inherently gapless. The following lossy audio file formats have provisions for gapless encoding.

Some other formats do not officially support gapless encoding, but some implementations of encoders or decoders may handle gapless metadata.

  • LAME-encoded MP3 can be gapless with players that support the LAME Mp3 info tag.
  • AAC in MP4 encoded with Nero Digital from Nero AG can be gapless with foobar2000.
  • AAC in MP4 encoded with iTunes 7.0 can be gapless with iTunes 7.0 and latest foobar2000.

Gapless solutions

Note: Winamp 5.2 implements gapless playback in its built-in MP3 decoder.

Test samples

  • [1]: 17 very short gapless samples (2 MB)

External links