QOA

From Hydrogenaudio Knowledgebase
Revision as of 15:54, 8 April 2024 by Porcuswiki (talk | contribs) (With ffmpeg 7.0, it has so many implementations that it might be worth a wiki page.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Quite OK Audio Format (QOA) is a niche lossy audio codec created by Dominic Szablewski in 2023. It designed to be, in the author's words, lossy, simple and quite ok: "simple" to computers in terms of decoding speed (intended to close in on ADPCM speeds at better quality[1]) and to humans in terms of the format specification, which fits a single page[2]. It is not intended to compete with current lossy encoders in terms of near-transparency at low bitrates – indeed, "low" bitrates are arguably not possible at the sample rates that yield "quite ok" quality.

QOA is available as a free and open-source (MIT licensed) library. Its github repository[3] links to alternative implementations; also, ffmpeg can decode it (from version 7.0). A third-party foobar2000 QOA playback component is available (note the warning given).

Properties

(Disclaimer: Part of the following is taken from the author's initial presentation[4], which pre-dates the current version.)

  • QOA is a CBR format, spending 3.2 bytes per sample per channel, each channel encoded separately. In addition, there is a frame header per 5120 interchannel sample, and a file header.
  • Bit rate is then defined by sample rate, which could be from 1 to 224 Hz in integer steps, and by channel count. For 44.1 kHz stereo, bit rate will then amount to 277 kbit/s.
  • Multichannel support: Format allows for 255 channels. A decoder is required to support 1 to 8 channels, possibly varying over frames (channel allocation coinciding with FLAC frames' channel allocation).
  • Encoding method: Like several audio formats, it works with a predictor from past samples (fourth order). The residual – i.e. the difference between "predicted" and "actual" sample value in the source signal – is then replaced by one out of eight hard-coded values (this is the lossy part).

The reference implementation is 400 lines of C code. The format can encode fast, but it is also possible to try to improve quality by spending computing effort on noise shaping upon encoding, selecting the residual value in a more clever way than naive roundoff[5].

References