Constant-Q transform: Difference between revisions

From Hydrogenaudio Knowledgebase
(Added a page about a logarithmically-spaced multi-band Goertzel filter bank)
 
mNo edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{stub}}
{{stub}}
[[File:constant q transform.png|thumb|256px|A constant-Q spectrogram. Notice that lower frequencies (the bottom part) having horizontal/time-axis blur.]]
[[File:short time fourier transform.png|thumb|256px|In comparison, the constant-bandwidth version of Goertzel algorithm spectrogram have same time-axis blur for all frequencies (vertical slices), but it is blurrier in vertical axis at bottom part of the spectrogram.]]
'''Constant-Q''' and '''variable-Q transforms''' ('''CQT/VQT''') are spectral analysis algorithms that usually have logarithmic frequency spacing and time/frequency resolution following octave series. Due to its usually logarithmic frequency resolution, it is suited for musical representation.
'''Constant-Q''' and '''variable-Q transforms''' ('''CQT/VQT''') are spectral analysis algorithms that usually have logarithmic frequency spacing and time/frequency resolution following octave series. Due to its usually logarithmic frequency resolution, it is suited for musical representation.


== Overview ==
== Overview ==
{{panel|More details are available on [[wikipedia:constant-Q transform|a Wikipedia page about the same topic]].|color=green}}
Usually, the [[Fast Fourier Transform|FFT]] are linearly-spaced in frequency and are constant-bandwidth, which is better suited for perfect reconstruction, however, due to the fact musical notes are logarithmically-spaced and how auditory perception works, the FFT is not suited for anything musical even though it is used in some RTA analyzers.
Usually, the [[Fast Fourier Transform|FFT]] are linearly-spaced in frequency and are constant-bandwidth, which is better suited for perfect reconstruction, however, due to the fact musical notes are logarithmically-spaced and how auditory perception works, the FFT is not suited for anything musical even though it is used in some RTA analyzers.


The constant-Q transform can be constructed using multi-band Goertzel algorithm with each one has its own window size; lower frequencies have larger window size and vice versa, and with logarithmic frequency spacing (a 120-band Goertzels covering 20Hz-20kHz range and each band corresponds to musical notes). However, while the auditory perception are non-linear, it is not exactly logarithmic as the pitch perception is linear and constant-bandwidth at bass frequencies.
The constant-Q transform can be constructed using multi-band Goertzel algorithm with each one has its own window size; lower frequencies have larger window size and vice versa, and with logarithmic frequency spacing (a 120-band Goertzels covering 20Hz-20kHz range and each band corresponds to musical notes). However, while the auditory perception are non-linear, it is not exactly logarithmic as the pitch perception is linear and constant-bandwidth at bass region.<ref>Christian Schörkhuber, Anssi Klapuri, Nicki Holighaus, Monika Dörfler (2014). [https://www.researchgate.net/publication/274009051_A_Matlab_Toolbox_for_Efficient_Perfect_Reconstruction_Time-Frequency_Transforms_with_Log-Frequency_Resolution A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution].</ref>
 
[[File:variable q spectrogram.png|thumb|256px|Variable-Q transform have benefits of better temporal resolution at lower frequencies (albeit with slightly bad frequency resolution at lower frequencies), just like traditional STFT while having pseudo-logarithmic resolution at higher frequencies.]]
[[File:constant q spectrogram.png|thumb|256px|In comparison, the CQT version have poor time resolution on lower frequencies but it has sharper frequency-axis resolution than VQT version.]]
 
Additionally, the gamma parameter can be used to gradually reduce the Q factor for lower frequencies to improve temporal resolution for that region. Alternatively, the band spacing can be set to perceptual frequency scales like Mel and Bark but it works best when the bandwidth is set according to '''abs(high - low)''' for each band. Either way, this becomes a variable-Q transform.<ref>Filip ZAPLATA, Miroslav KASAL (2015). [https://www.researchgate.net/publication/276076772_Efficient_Spectral_Power_Estimation_on_an_Arbitrary_Frequency_Scale Efficient Spectral Power Estimation on an Arbitrary Frequency Scale].</ref>


Although FFT itself can be used in conjunction with frequency-domain kernels, calculating a CQT directly is slow even with Goertzel algorithm unless a [[sliding DFT]] is used.
Although FFT itself can be used in conjunction with frequency-domain kernels, calculating a CQT directly is slow even with Goertzel algorithm unless a [[sliding DFT]] is used.
== References ==
<references/>


== List of audio applications that uses CQT ==
== List of audio applications that uses CQT ==
* [https://ffmpeg.org/ffmpeg-filters.html#showcqt showcqt] and [https://ffmpeg.org/ffmpeg-filters.html#showcwt showcwt] filter in FFmpeg
* [https://github.com/cnlohr/colorchord ColorChord] chromatic sound-to-light mapping system
* [https://codepen.io/TF3RDL/pen/poQJwRW Frequency bands spectrum analyzer using either FFT or CQT] (CodePen audio visualization project)
* [https://editor.p5js.org/jayadiandri/sketches/GyKsfn8JO Non-realtime spectrogram] (Interactive showcase of non-realtime spectrogram with various algorithms)
[[Category:Technical]]
[[Category:Technical]]
[[Category:Signal Processing]]
[[Category:Signal Processing]]

Latest revision as of 05:12, 22 September 2023

A constant-Q spectrogram. Notice that lower frequencies (the bottom part) having horizontal/time-axis blur.
In comparison, the constant-bandwidth version of Goertzel algorithm spectrogram have same time-axis blur for all frequencies (vertical slices), but it is blurrier in vertical axis at bottom part of the spectrogram.

Constant-Q and variable-Q transforms (CQT/VQT) are spectral analysis algorithms that usually have logarithmic frequency spacing and time/frequency resolution following octave series. Due to its usually logarithmic frequency resolution, it is suited for musical representation.

Overview

Usually, the FFT are linearly-spaced in frequency and are constant-bandwidth, which is better suited for perfect reconstruction, however, due to the fact musical notes are logarithmically-spaced and how auditory perception works, the FFT is not suited for anything musical even though it is used in some RTA analyzers.

The constant-Q transform can be constructed using multi-band Goertzel algorithm with each one has its own window size; lower frequencies have larger window size and vice versa, and with logarithmic frequency spacing (a 120-band Goertzels covering 20Hz-20kHz range and each band corresponds to musical notes). However, while the auditory perception are non-linear, it is not exactly logarithmic as the pitch perception is linear and constant-bandwidth at bass region.[1]

Variable-Q transform have benefits of better temporal resolution at lower frequencies (albeit with slightly bad frequency resolution at lower frequencies), just like traditional STFT while having pseudo-logarithmic resolution at higher frequencies.
In comparison, the CQT version have poor time resolution on lower frequencies but it has sharper frequency-axis resolution than VQT version.

Additionally, the gamma parameter can be used to gradually reduce the Q factor for lower frequencies to improve temporal resolution for that region. Alternatively, the band spacing can be set to perceptual frequency scales like Mel and Bark but it works best when the bandwidth is set according to abs(high - low) for each band. Either way, this becomes a variable-Q transform.[2]

Although FFT itself can be used in conjunction with frequency-domain kernels, calculating a CQT directly is slow even with Goertzel algorithm unless a sliding DFT is used.

References

  1. Christian Schörkhuber, Anssi Klapuri, Nicki Holighaus, Monika Dörfler (2014). A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution.
  2. Filip ZAPLATA, Miroslav KASAL (2015). Efficient Spectral Power Estimation on an Arbitrary Frequency Scale.

List of audio applications that uses CQT