BS.1387: Difference between revisions

Revision as of 15:07, 10 August 2023

ITU-R recommendation BS.1387 is the document that defines Perceptual Evaluation of Audio Quality (PEAQ), an objective measurement technique used to measure the quality of encoded/decoded audio files. It acts in contrast to more the common place subjective testing methodology deployed using ABX and ABC/HR reference testing -- frequently preferred by hydrogenaudio. PEAQ returns an "ODG" rating, which is intended to match the difference in subjective (1–5) scores between the two input samples.

Structure

PEAQ has two versions: basic and advanced. The basic version only uses an FFT-based ear model and is easier to compute. The advanced version uses both FFT and filter bank and is expected to be more accurate.

History

BS.1387 was initially published in 1998. It was updated to BS.1387-1 in 2001 and BS.1387-2 in 2023.

BS.1387-1 includes important technical corrections -- ones that are important to reach the standard's own conformance criteria.^[1]
BS.1387-2 seems to have no real change, except for removal of references to BS.1115, addition of a table of contents, and extensive reformatting.

EAQUAL

EAQUAL (Evaluation Of Audio Quality) is an open-source software that implements PEAQ's basic model only.

Invoking EAQUAL

As of version 0.1.3alpha, the -h argument can be used to find out how to use eaqual (ex: eaqual -h).

To compare a test wave file to a reference wave file, one can use for example: eaqual -fref ref.wav -ftest test.wav.

Interpreting EAQUAL output

EAQUAL outputs one score, the PEAQ "ODG" rating. This ODG (Objective Difference Grade) rating is designed by ITU to match an SDG (Subjective Difference Grade) rating, which is the difference between the subjective (1–5) scores between the two input samples. Assuming the HydrogenAudio subjective scoring system, where the reference sample is always scored as a perfect 5, adding 5 points to ODG should produce an approximation of the subjective score.

Status of the project

Development of EAQUAL was halted in 2002 due to patent concerns. This is not a problem for PEAQ compilance, however, considering the 2001 BS.1387-1 does not differ substantially from the 2023 version.

The ITU patent declaration system does not list any specific PEAQ patent by number. However, no new patents have been added since 1998, so any patent should have expired by 2018.^[2]

Versions of EAQUAL include:

EAQUAL Sourcecode linux archive of c code used to implement EAQUAL provided by Gabriel Bouvigne, mirrored on github by spxnn
EAQUAL Tools zip compression archive of the utility used to perform EAQUAL tests provided by Rarewares.
ivan-codelegs github fork, adds macOS support

GstPEAQ

GstPEAQ is an implementation of PEAQ, both basic and advanced, in GStreamer. In addition to the ODG, it also outputs the distortion index (DI), which is not clipped at extremes and not fitted to score anchors. On the HA multiformat dataset:

The advanced model gives a correlation improvement of ~0.2 over basic;
DI is slightly better at predicting subjective scores than ODG, with a correlation improvement of ~0.03.

Comparison with subjective listening tests

EAqual results for the AAC@128v2 listening test - fair Pearson correlation (0.699) among higher-quality samples: all AAC
PEAQ done right, allegedly || Multiformat correlation - great Pearson correlation (0.924, DI Adv): samples of three quality groups

HA comparisons between PEAQ and human raters remain inconclusive. PEAQ is considered useful for an approximation of human senses in codec development and research, but concrete results still need human participation.

Other implementations

PEAQ-Basic is simple enough to have many implementations.

peaqb is another implementation of PEAQ. Last updated 2003.
There a good number of Matlab implementations for researchers. But it's Matlab, so there's gonna be academic code smell.

Other objective metrics

PEAQ is not the end. There are other metrics:^[3]

ITU also has PESQ and POLQA, both designed for speech.
VISQOL is Google's open-source metric. It works for both speech and music, but the neural network (don't worry, it runs fast enough on a CPU) is trained for short clips only. Maybe someone can write a tool to one file into many clips and see individual segment scores.
CDPAM is allegedly the model that comes closest to human datasets. Unfortunately the only pre-trained model works on 22050 Hz. And it's academic neural-network code -- not something you can expect to run on first try.

There are also much more primitive methods that don't attempt anything perceptual, preferred by peddlers of Bluetooth codecs:

SNR
THD+N

External links

Wikipedia:Perceptual Evaluation of Audio Quality
ITU BS.1387 download -- free full text of the standard, straight from the official site.

[1] ttps://www.opticom.de/download/CorrectionstoBS1387.pdf

[2] ttps://www.itu.int/en/ITU-R/study-groups/Pages/itu-r-patent-information.aspx

[3] ttps://github.com/jonnor/machinehearing/blob/09b5060bd03b8a49fc1d0afd8eedba4babca83ca/audio-quality/README.md

[1]

[2]

[3]

@@ Line 35: / Line 35: @@
 [https://github.com/HSU-ANT/gstpeaq GstPEAQ] is an implementation of PEAQ, ''both'' basic and advanced, in GStreamer. In addition to the ODG, it also outputs the distortion index (DI), which is not clipped at extremes and not fitted to score anchors. On the HA multiformat dataset:
 * The advanced model gives a correlation improvement of ~0.2 over basic;
-* DI is slightly better at predicting subjective scores than ODG, with a correlation improvement of ~0.04.
+* DI is slightly better at predicting subjective scores than ODG, with a correlation improvement of ~0.03.
 == Comparison with subjective listening tests ==