AccurateRip: Difference between revisions

From Hydrogenaudio Knowledgebase
(→‎Ripped track checksums: added caveat about AccurateRip being based on "complete" rips)
 
(39 intermediate revisions by 17 users not shown)
Line 1: Line 1:
'''AccurateRip''' is a database that accepts and supplies two things: 1. estimates of the accuracy of the digital audio extraction (DAE) capabilities of CD-ROM/CD-RW drives (specifically, their read offsets), and 2. checksums for audio tracks extracted with those drives.
'''AccurateRip''' is an online ripping accuracy database. Using submissions from people all around the world, a bit-perfect verification of an audio track rip can be assured with a positive match. AccurateRip also allows the offsets of drives to be determined.
 
The database is maintained by Illustrate Ltd., the company run by "Spoon", primary developer of [[dBpoweramp]]. All of the data is submitted by the Windows-based ripping programs dBpoweramp and [[Exact Audio Copy]] via a [[wikipedia:Dynamic-link library|DLL]] licensed from Illustrate.<ref>As per [https://hydrogenaud.io/index.php/topic,77600#msg678407 a Jan. 2010 post by Spoon], "Only two programs submit to AR, EAC and dBpoweramp, these two submit through the standard AccurateRip access dll, which ensures that offsets are correct, we are not talking submissions to freedb where one record does not effect the submission of future records if keyed with a wrong offset. You might not like it, but it is the only way to ensure that the quality of the database is not impaired."</ref> Other rippers and tools, such as [[CUETools]] and Mac OS-based rippers, can receive info from the database in order to look up drive offsets or check rip quality, but they cannot submit their own rip results to it.


==Drive read offsets==
==Drive read offsets==
Very few CD drives actually start reading data from audio CDs exactly at the sector requested by DAE software. There are drives that are off by over 1 sector (1/75th of a second), but most are off by much less (<sup>1</sup>/<sub>250</sub> to <sup>1</sup>/<sub>350</sub> second). Most modern CD drives have "Accurate Stream" technology, so there's no "jitter", meaning in this case that the variance is consistent from read to read, and will tend to be the same for all drives of a certain make & model.
Very few CD drives actually start reading data from audio CDs exactly at the sector requested by DAE software. There are drives that are off by over 1 sector (1/75th of a second), but most are off by much less (<sup>1</sup>/<sub>250</sub> to <sup>1</sup>/<sub>350</sub> second). Most modern CD drives have "Accurate Stream" technology, so there's no "jitter", meaning in this case that the variance is consistent from read to read, and will tend to be the same for all drives of a certain make & model.


The AccurateRip database allows one to find out the read offset, which is a number, for a given make & model of CD drive. This number can then be used by DAE software to ensure that each track is ripped from its exact start to its exact finish.
The AccurateRip database allows one to find out the read offset, which is normally constant for given make & model of CD drive. This number can then be used by DAE software to ensure that each track is ripped from its exact start to its exact finish.
 
The offset is given in samples. One "sample" on an audio CD is 4 bytes, consisting of a 2-byte left-channel value and a 2-byte right-channel value. There are 2352 bytes, or 588 samples, in each sector of an audio CD, corresponding to 1/75th of a second of sound. Therefore, an AccurateRip offset of +134 means the drive consistently delivers data from 536 bytes ''behind'' (earlier than) where it was asked to read from, so the DAE software needs to look that far ahead (hence the positive offset) in order to get the right data.
 
When offsets are taken into account, the DAE software might have to ask the drive to "overread" into the lead-in or lead-out portions of the disc, where there's no audio data. Some drives can't be asked to do it, some drives will try to do it and fail, and some will just return null samples (a stream of "0" bytes, a.k.a. digital silence). If the drive can't overread, then there will be samples missing from the extracted track. The DAE software can correct for this by padding the track with digital silence so it's the correct length.


The offset is given in samples. One "sample" on an audio CD is 4 bytes, consisting of a 2-byte left-channel value and a 2-byte right-channel value. There are 2352 bytes, or 588 samples, in each sector of an audio CD, corresponding to 1/75th of a second of sound. Therefore, an AccurateRip offset of +134, for example, means the drive consistently delivers data from 536 bytes ''behind'' (earlier than) where it was asked to read from, so the DAE software needs to look that far ahead (hence the positive offset) in order to get the right data.
===Offset accuracy===
Determining the actual read offset of a drive is difficult. The reference measurements Andre Wiethoff made and which were adopted by Spoon for the AccurateRip database were challenged in late 2006; [https://web.archive.org/web/20120903210117/digital-inn.de/exact-audio-copy-english/28787-andre-wiethoff-who-feels-have-say-offsets.html] to support a claim that the reference is actually off by 30 samples. The offsets are 30 samples too low, or the correction values are 30 samples too high, however you want to look at it. Wiethoff feels that it's too late to change to a different reference now that the database is populated. However, since all submissions are calibrated to the same reference, and the apparent error so small (~680 µs), the reliability of AccurateRip data is not compromised in any way whatsoever.


==Ripped track checksums==
==Ripped track checksums==
Once all the samples for a track have been extracted and put into a file such as a WAV, a checksum can be generated to summarize the sample data. Identical data will produce identical checksums. If the data is the slightest bit different, the checksums will usually be very different. The checksums derived from the same tracks from the same pressings of the same CDs, so long as drive offsets have been accounted for, can be compared in order to determine whether the extraction was error-free. That is, if you rip a track and find that your checksum matches what everyone else got, then you can be confident there are no missing or incorrect samples (or that you've all got exactly the same damage, which is nearly impossible). See the [[secure ripping]] article for more on this subject.
Once all the samples for a track have been extracted and put into a file such as a WAV, a checksum can be generated to summarize the sample data. Identical data will produce identical checksums. If the data is the slightest bit different, the checksums will usually be very different. The checksums derived from the same tracks from the same pressings of the same CDs, so long as drive offsets have been accounted for, can be compared in order to determine whether the extraction was error-free. That is, if you rip a track and find that your checksum matches what everyone else got, then you can be confident there are no missing or incorrect samples (or that you've all got exactly the same damage, which is nearly impossible). See the [[secure ripping]] article for more on this subject.


The AccurateRip database contains checksum data for the tracks on thousands of CDs. DAE software can use this info to decide whether to try re-reading a track that produced a different checksum than was expected.
The AccurateRip database contains over 2 million unique discs. DAE software can use this info to decide whether to try re-reading a track that produced a different checksum than was expected.
 
===Checksum calculation===
Technical details on checksum calculation can be found here: [https://forum.dbpoweramp.com/showthread.php?20641 AccurateRip-CRC-Calculation]
 
A Linux command-line tool to compute AccurateRip checksums can be found here: https://github.com/leo-bogert/accuraterip-checksum
 
Each AccurateRip checksum is based on a complete track rip, from the beginning of the track to the end, as determined by the track's entry in the disc's table of contents. This means that any silence or "gap" at the beginning or end of the track, except before the first track, must be included in the rip. If your DAE software is configured to trim silence or to do anything with gaps other than put them at the end of the preceding track (as normally happens when gaps aren't taken into account), then it's unlikely the checksums of your rips will be submitted to or compared against those in the AccurateRip database.
 
The checksum algorithm ignores the first 2939 samples<ref name=Steffensen>Not 2940 samples; see http://jonls.dk/2009/10/calculating-accuraterip-checksums/</ref> (just under 5 frames, a little over 0.013s of audio) at the beginning of the first track, and ignores 2940 samples (exactly 5 frames) at the end of the last track. The largest drive offset in the database, as of late 2011, is only 1776 samples, so when an offset-corrected drive "overreads" beyond the boundaries of the audio data, it normally won't affect the checksum. If the overread were to exceed 2940 samples, though, the algorithm assumes the data is padded with digital silence (nothing but zeroes). If your drive isn't capable of overreading and your DAE software doesn't zero-pad the rip to simulate an overread, then it's unlikely the checksums of your rips will be submitted to or compared against those in the AccurateRip database.
 
An optimization oversight in the original AccurateRip checksum algorithm results in an unintended loss of accuracy: about 3% of the audio data is not counted in the checksum at all. The left channel's samples are fully included, but in a 65,536-sample cycle, half of the right-channel samples are treated as if they're missing anywhere from 1 bit to all 16 bits. [https://hydrogenaud.io/index.php/topic,61468 Proposals for improving the algorithm, the database, and the database's API were made] and an improved algorithm was implemented for AccurateRip v2 checksums, which are now the default calculated by the rippers which support AccurateRip. How much improvement the v2 checksum has over the original is suspect [http://www.hydrogenaudio.org/forums/index.php?showtopic=66233&st=1350&p=756260&#entry756260 link]. The v2 checksums are treated as separate pressings from the old ones, so there's no risk to the integrity of existing data.
 
==Pressings==
Pressings, for AccurateRip's purposes, are where a batch of identical audio CDs is created by the pressing plant, then another batch is created, usually at a later date, with the same data but a different offset. The offset is due to variations in the way the master discs are replicated for the pressing machinery. Apart from this offset, discs across pressings are identical; the audio data on them is exactly the same. AccurateRip, as it was originally written, could only verify results for a specific pressing (same data, same offset), though the program's calibration procedure still made checking against alternate pressings possible.  A major update to the program CUETools offered a mathematical solution to quickly cross-check a rip against multiple pressings. AccurateRip v2 would later be released to incorporate cross-checking against alternate pressings as well.
 
==AccurateRip data analysis==
It is possible to run statistical analysis on the hundreds of millions of submissions to AccurateRip to determine how accurate a given drive is, with the idea that with enough spread of drive submissions, anomalous results from damaged CDs will be averaged out. The last such calculation can be found here: [https://forum.dbpoweramp.com/showthread.php?23074 CD-DVD-Drive-Accuracy-List-2011]


The AccurateRip checksums are based on complete track rips; they include pre-track gaps and digital silence at the beginning and end of tracks. They also assume that when an offset-corrected drive "overreads" beyond the boundaries of the disc's audio data, the track is padded with digital silence. So if your DAE software is configured to trim silence, to not zero-pad overreads (if necessary), or to do anything with gaps other than put them at the end of the preceding track (as normally happens when gaps aren't taken into account), then it's unlikely the checksums of your rips will be compared against or submitted to those in the AccurateRip database.
There is also a report of drive offsets here: [http://www.accuraterip.com/driveoffsets.htm CD Drive Offsets]


==Submitting offsets and checksums==
==Submitting offsets and checksums==
The database is designed to be anonymously accessed by DAE software, via HTTP. Submitting data should only be done through such software.
The database is designed to be anonymously accessed by DAE software, via HTTP. Submitting data should only be done through such software.
==Software that uses AccurateRip==
* [[CUETools]]
* [https://github.com/cyanreg/cyanrip cyanrip]
* [[dBpoweramp]]
* [[Exact Audio Copy]]
* [[foobar2000]]
* [[fre:ac]]
* [[MediaMonkey]]
* [https://github.com/thomasvs/morituri morituri]
* [[Python Audio Tools]]
* [[Rip]]
* [[Songbook]]
* [[Whipper]]
* [[XLD]]
==History==
AccurateRip was introduced as an add-on for dBpowerAMP Music Converter (dMC) in January 2003.
<!--expand this-->
==Notes and references==
<references/>


==External links==
==External links==
* [http://www.accuraterip.com/ AccurateRip web site]
* [http://www.accuraterip.com/ AccurateRip web site]
[[Category:CD ripping]]

Latest revision as of 09:12, 24 November 2021

AccurateRip is an online ripping accuracy database. Using submissions from people all around the world, a bit-perfect verification of an audio track rip can be assured with a positive match. AccurateRip also allows the offsets of drives to be determined.

The database is maintained by Illustrate Ltd., the company run by "Spoon", primary developer of dBpoweramp. All of the data is submitted by the Windows-based ripping programs dBpoweramp and Exact Audio Copy via a DLL licensed from Illustrate.[1] Other rippers and tools, such as CUETools and Mac OS-based rippers, can receive info from the database in order to look up drive offsets or check rip quality, but they cannot submit their own rip results to it.

Drive read offsets

Very few CD drives actually start reading data from audio CDs exactly at the sector requested by DAE software. There are drives that are off by over 1 sector (1/75th of a second), but most are off by much less (1/250 to 1/350 second). Most modern CD drives have "Accurate Stream" technology, so there's no "jitter", meaning in this case that the variance is consistent from read to read, and will tend to be the same for all drives of a certain make & model.

The AccurateRip database allows one to find out the read offset, which is normally constant for given make & model of CD drive. This number can then be used by DAE software to ensure that each track is ripped from its exact start to its exact finish.

The offset is given in samples. One "sample" on an audio CD is 4 bytes, consisting of a 2-byte left-channel value and a 2-byte right-channel value. There are 2352 bytes, or 588 samples, in each sector of an audio CD, corresponding to 1/75th of a second of sound. Therefore, an AccurateRip offset of +134 means the drive consistently delivers data from 536 bytes behind (earlier than) where it was asked to read from, so the DAE software needs to look that far ahead (hence the positive offset) in order to get the right data.

When offsets are taken into account, the DAE software might have to ask the drive to "overread" into the lead-in or lead-out portions of the disc, where there's no audio data. Some drives can't be asked to do it, some drives will try to do it and fail, and some will just return null samples (a stream of "0" bytes, a.k.a. digital silence). If the drive can't overread, then there will be samples missing from the extracted track. The DAE software can correct for this by padding the track with digital silence so it's the correct length.

Offset accuracy

Determining the actual read offset of a drive is difficult. The reference measurements Andre Wiethoff made and which were adopted by Spoon for the AccurateRip database were challenged in late 2006; [1] to support a claim that the reference is actually off by 30 samples. The offsets are 30 samples too low, or the correction values are 30 samples too high, however you want to look at it. Wiethoff feels that it's too late to change to a different reference now that the database is populated. However, since all submissions are calibrated to the same reference, and the apparent error so small (~680 µs), the reliability of AccurateRip data is not compromised in any way whatsoever.

Ripped track checksums

Once all the samples for a track have been extracted and put into a file such as a WAV, a checksum can be generated to summarize the sample data. Identical data will produce identical checksums. If the data is the slightest bit different, the checksums will usually be very different. The checksums derived from the same tracks from the same pressings of the same CDs, so long as drive offsets have been accounted for, can be compared in order to determine whether the extraction was error-free. That is, if you rip a track and find that your checksum matches what everyone else got, then you can be confident there are no missing or incorrect samples (or that you've all got exactly the same damage, which is nearly impossible). See the secure ripping article for more on this subject.

The AccurateRip database contains over 2 million unique discs. DAE software can use this info to decide whether to try re-reading a track that produced a different checksum than was expected.

Checksum calculation

Technical details on checksum calculation can be found here: AccurateRip-CRC-Calculation

A Linux command-line tool to compute AccurateRip checksums can be found here: https://github.com/leo-bogert/accuraterip-checksum

Each AccurateRip checksum is based on a complete track rip, from the beginning of the track to the end, as determined by the track's entry in the disc's table of contents. This means that any silence or "gap" at the beginning or end of the track, except before the first track, must be included in the rip. If your DAE software is configured to trim silence or to do anything with gaps other than put them at the end of the preceding track (as normally happens when gaps aren't taken into account), then it's unlikely the checksums of your rips will be submitted to or compared against those in the AccurateRip database.

The checksum algorithm ignores the first 2939 samples[2] (just under 5 frames, a little over 0.013s of audio) at the beginning of the first track, and ignores 2940 samples (exactly 5 frames) at the end of the last track. The largest drive offset in the database, as of late 2011, is only 1776 samples, so when an offset-corrected drive "overreads" beyond the boundaries of the audio data, it normally won't affect the checksum. If the overread were to exceed 2940 samples, though, the algorithm assumes the data is padded with digital silence (nothing but zeroes). If your drive isn't capable of overreading and your DAE software doesn't zero-pad the rip to simulate an overread, then it's unlikely the checksums of your rips will be submitted to or compared against those in the AccurateRip database.

An optimization oversight in the original AccurateRip checksum algorithm results in an unintended loss of accuracy: about 3% of the audio data is not counted in the checksum at all. The left channel's samples are fully included, but in a 65,536-sample cycle, half of the right-channel samples are treated as if they're missing anywhere from 1 bit to all 16 bits. Proposals for improving the algorithm, the database, and the database's API were made and an improved algorithm was implemented for AccurateRip v2 checksums, which are now the default calculated by the rippers which support AccurateRip. How much improvement the v2 checksum has over the original is suspect link. The v2 checksums are treated as separate pressings from the old ones, so there's no risk to the integrity of existing data.

Pressings

Pressings, for AccurateRip's purposes, are where a batch of identical audio CDs is created by the pressing plant, then another batch is created, usually at a later date, with the same data but a different offset. The offset is due to variations in the way the master discs are replicated for the pressing machinery. Apart from this offset, discs across pressings are identical; the audio data on them is exactly the same. AccurateRip, as it was originally written, could only verify results for a specific pressing (same data, same offset), though the program's calibration procedure still made checking against alternate pressings possible. A major update to the program CUETools offered a mathematical solution to quickly cross-check a rip against multiple pressings. AccurateRip v2 would later be released to incorporate cross-checking against alternate pressings as well.

AccurateRip data analysis

It is possible to run statistical analysis on the hundreds of millions of submissions to AccurateRip to determine how accurate a given drive is, with the idea that with enough spread of drive submissions, anomalous results from damaged CDs will be averaged out. The last such calculation can be found here: CD-DVD-Drive-Accuracy-List-2011

There is also a report of drive offsets here: CD Drive Offsets

Submitting offsets and checksums

The database is designed to be anonymously accessed by DAE software, via HTTP. Submitting data should only be done through such software.

Software that uses AccurateRip

History

AccurateRip was introduced as an add-on for dBpowerAMP Music Converter (dMC) in January 2003.


Notes and references

  1. As per a Jan. 2010 post by Spoon, "Only two programs submit to AR, EAC and dBpoweramp, these two submit through the standard AccurateRip access dll, which ensures that offsets are correct, we are not talking submissions to freedb where one record does not effect the submission of future records if keyed with a wrong offset. You might not like it, but it is the only way to ensure that the quality of the database is not impaired."
  2. Not 2940 samples; see http://jonls.dk/2009/10/calculating-accuraterip-checksums/


External links