Create a long-term archive

From Hydrogenaudio Knowledgebase
Revision as of 14:38, 8 December 2006 by Pepoluan (Talk | contribs)

Jump to: navigation, search

This is still being discussed in this HA thread. Feel free to barge in. Result of the discussions will be generalized and put in this article.

Why do we need a long-term archive?

Prior to the invention of Compact Discs, audio is stored in analog media such as vinyls, magnetic reels, and cassette tapes. These media are very prone to environmental damage; e.g. vinyls may be scratched, and magnetic media may be crinkled or demagnetized. Even the CD itself is not perfect. A scratch will damage the media, rendering audible errors into its audio tracks. And the error protection of Red Book CDs are not perfect; sometimes errors cannot be corrected, and it is up to the player to 'repair' by interpolation.

On the other hand, audio encoding technology (and data storage technology) has progressed to the point where we can (somewhat) easily make perfect (or near-perfect) copy of all audio tracks in the world, with proper error-detection-and-correction.

So, for posterity (and personal enjoyment well into your old age), it is very plausible -- and feasible -- to create a long-term archive of your audio collection.

Considerations for creating a long-term archive

Making a long-term archive is not something to be lightly undertaken. You must plan it. Here we attempt to provide you with a general guide to making your long-term audio archive.


First of all, you must decide on how to replicate (i.e. copy) your audio tracks into your computer.

If your source is an Audio CD, it's rather easy. Use the freely available secure ripper such as EAC or CDex to rip your CD into WAV (or lossless) files. Accuracy of ripping can be improved further by using the AccurateRip database (unfortunately, the AccurateRip database is not complete; your CD may not exist there).

If your source is analog, then things get more complicated:

  1. First of all, you must ensure that your source is not damaged in any way.
  2. Then, you must find a tunable player to ensure faithful reprodution of the audio track.
  3. Next, you must have a high-quality-recording-capable audio card.
  4. After that, you must connect the player to your audio card through a quality connection; by "connection", I also mean such active elements like filters, EQs, amps.
  5. Finally, you must use a good wave-recording program to encode the pumped audio into a huge WAV file.


Second consideration, is the encoding used. Although you can use WAV's (i.e. PCM) to store your replicated audio tracks, it is not recommended for several reasons:

  • No tagging capability -- although information on the tracks may be stored in text files, it is much more practical to store them in tags within the encoded file itself.
  • No error detection -- it is extremely vital for long-term archives; if your audio track develops an error, then you can discard it and (hopefully) restore it from a backup-of-backup.
  • Big size -- since WAV's are uncompressed, storing audio tracks in WAV's will require a much greater amount of media.

Most of the modern-day lossless encoding should suffice. The most popular format for long-term archiving seems to be FLAC, WavPack, and Monkey's Audio. However, other formats may provide better compression.

We strongly advise you to not use lossy encoding. With lossy encoding, you will not end up with a bit-by-bit identical archive.


Third consideration, is what media to use. There are a lot of usable media out there, so let's go over them one-by-one.

( One requirement is that the media must be big enough to store at least 1 CD's worth of audio track )

Media Pros Cons
  • Cheapest per disc
  • Vulnerable to environment damage
  • Vulnerable to optical damage
  • Great capacity
  • Rather expensive
  • Media damages easily
Memory cards
  • Very practical size
  • May be playable in portable digital audio players
  • Fast and easy seeking
  • Longevity relatively unknown
  • Expensive per megabyte
Hard disks
  • Honkin' big capacity
  • Very fast for reading/writing
  • Big & heavy
  • Impractical -- rather complex to mount/unmount
  • Electromechanical components