Constant-Q transform: Difference between revisions
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
Usually, the [[Fast Fourier Transform|FFT]] are linearly-spaced in frequency and are constant-bandwidth, which is better suited for perfect reconstruction, however, due to the fact musical notes are logarithmically-spaced and how auditory perception works, the FFT is not suited for anything musical even though it is used in some RTA analyzers. | Usually, the [[Fast Fourier Transform|FFT]] are linearly-spaced in frequency and are constant-bandwidth, which is better suited for perfect reconstruction, however, due to the fact musical notes are logarithmically-spaced and how auditory perception works, the FFT is not suited for anything musical even though it is used in some RTA analyzers. | ||
The constant-Q transform can be constructed using multi-band Goertzel algorithm with each one has its own window size; lower frequencies have larger window size and vice versa, and with logarithmic frequency spacing (a 120-band Goertzels covering 20Hz-20kHz range and each band corresponds to musical notes). However, while the auditory perception are non-linear, it is not exactly logarithmic as the pitch perception is linear and constant-bandwidth at bass frequencies. | The constant-Q transform can be constructed using multi-band Goertzel algorithm with each one has its own window size; lower frequencies have larger window size and vice versa, and with logarithmic frequency spacing (a 120-band Goertzels covering 20Hz-20kHz range and each band corresponds to musical notes). However, while the auditory perception are non-linear, it is not exactly logarithmic as the pitch perception is linear and constant-bandwidth at bass region. | ||
Additionally, the gamma parameter can be used to gradually reduce the Q factor for lower frequencies to improve temporal resolution for that region. Alternatively, the band spacing can be set to perceptual frequency scales like Mel and Bark but it works best when the bandwidth is set according to '''abs(high - low)''' for each band. Either way, this becomes a variable-Q transform. | |||
Although FFT itself can be used in conjunction with frequency-domain kernels, calculating a CQT directly is slow even with Goertzel algorithm unless a [[sliding DFT]] is used. | Although FFT itself can be used in conjunction with frequency-domain kernels, calculating a CQT directly is slow even with Goertzel algorithm unless a [[sliding DFT]] is used. | ||
Line 12: | Line 14: | ||
* [[Foobar2000:Components/CQT Analyzer (foo cqt analyzer)|CQT Analyzer (foo_cqt_analyzer)]] visualization component for foobar2000 | * [[Foobar2000:Components/CQT Analyzer (foo cqt analyzer)|CQT Analyzer (foo_cqt_analyzer)]] visualization component for foobar2000 | ||
* showcqt and showcwt filter in FFmpeg | * showcqt and showcwt filter in FFmpeg | ||
* [https://github.com/cnlohr/colorchord ColorChord] chromatic sound-to-light mapping system | |||
[[Category:Technical]] | [[Category:Technical]] | ||
[[Category:Signal Processing]] | [[Category:Signal Processing]] |
Revision as of 06:47, 17 May 2023
This article is a stub. You can help the Hydrogenaudio Knowledgebase by expanding it.
Constant-Q and variable-Q transforms (CQT/VQT) are spectral analysis algorithms that usually have logarithmic frequency spacing and time/frequency resolution following octave series. Due to its usually logarithmic frequency resolution, it is suited for musical representation.
Overview
Usually, the FFT are linearly-spaced in frequency and are constant-bandwidth, which is better suited for perfect reconstruction, however, due to the fact musical notes are logarithmically-spaced and how auditory perception works, the FFT is not suited for anything musical even though it is used in some RTA analyzers.
The constant-Q transform can be constructed using multi-band Goertzel algorithm with each one has its own window size; lower frequencies have larger window size and vice versa, and with logarithmic frequency spacing (a 120-band Goertzels covering 20Hz-20kHz range and each band corresponds to musical notes). However, while the auditory perception are non-linear, it is not exactly logarithmic as the pitch perception is linear and constant-bandwidth at bass region.
Additionally, the gamma parameter can be used to gradually reduce the Q factor for lower frequencies to improve temporal resolution for that region. Alternatively, the band spacing can be set to perceptual frequency scales like Mel and Bark but it works best when the bandwidth is set according to abs(high - low) for each band. Either way, this becomes a variable-Q transform.
Although FFT itself can be used in conjunction with frequency-domain kernels, calculating a CQT directly is slow even with Goertzel algorithm unless a sliding DFT is used.
List of audio applications that uses CQT
- CQT Analyzer (foo_cqt_analyzer) visualization component for foobar2000
- showcqt and showcwt filter in FFmpeg
- ColorChord chromatic sound-to-light mapping system