Fraunhofer FDK AAC

From Hydrogenaudio Knowledgebase
Revision as of 07:11, 12 August 2023 by Artoria2e5 (talk | contribs) (→‎FDK License)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Current AAC encoders
(most to least recommended)
1 Apple AAC M/W
2 FhG AAC (Winamp) W
3 Fraunhofer FDK AAC S/L/M/W
4 Nero AAC L/W
5 FFmpeg 3.0+ AAC encoder S/L/M/W
6 FAAC S/L/M/W
7 Libav (pre-3.0 FFmpeg) AAC encoder S/L/M/W
S Source code available; L Linux; M macOS; W Windows
List of AAC encoders

The Fraunhofer FDK AAC is a high-quality open-source AAC encoder library developed by Fraunhofer IIS. It was officially released for Android, but has been ported to other platforms.

The licensed Fraunhofer AAC codec included in Winamp (often called FhG AAC) is not the same as the FDK AAC codec. While they use the same approach, they are developed by different teams, and target different platforms. The FDK library is built around fixed-point math and originally targeted low-delay communication on mobile devices.

FDK AAC is considered a favorable alternative to the Nero AAC codec, which is no longer developed.

Software Versions

Package/Component Version Developer/Maintainer License Description
FDK Encoder 4.0.0 [1] Fraunhofer IIS FDK License The FDK AAC library included in Android.
FDK SBR/PS Encoder
(for HE & HEv2)
4.0.0 [2]
FDK Decoder 3.0.0 [3]
FDK SBR/PS Decoder 2.2.6 [4]
fdk-aac 2.0.0 (2018-11-22)
(watch) based on FDK AAC (4.0.0/3.0.0)
shared library version 2.0.0
Martin Storsjö/Opencore AMR project FDK License with additions under Apache 2.0 license The FDK AAC encoder and decoder as a portable library separate from Android.
fdk-aac (debian) 0.1.6-1 [5]
libfdk-aac0 (shared library version 1.1.0)
The Debian source package for fdk-aac. Includes libfdk-aac* and the aac-enc encoding front end.
fdkaac 1.0.0 (using libfdk-aac 0.1.6) nu774 zlib An advanced front-end to the FDK AAC encoder using libfdk-aac.
FFmpeg/Libav support Libav: encode, decode
FFmpeg: encode, decode
Martin Storsjö ISC license A wrapper for libfdk-aac that adds support to FFmpeg and Libav/avconv. It is included in both projects, and is the recommended AAC encoder for FFmpeg.

FDK License

The license included by Fraunhofer in the FDK source code specifically allows distribution in source or binary forms, but does not license patented technologies described by the source code. It goes on to say "you may use this FDK AAC Codec software or modifications thereto only for purposes that are authorized by appropriate patent licenses".[1] As this governs use, it should not have anything to do with distribution. The Free Software Foundation (FSF) considers it fishy to have an invitation to purchase patent licenses in the text, but concedes that "any program is potentially threatened by patents". Considering the patent terms, all AAC software are indeed equally affected.

This license puts a limitation on charging for software that includes the library, leading Debian to consider it non-free. Debian does not comment on the patent situation.

The position of FFmpeg is that although the license is GPL-incompatible (and therefore nondistributable with GPL parts), it is acceptable to distribute the library with LGPL parts.[2] FFmpeg does not care about patents. (The AAC patent license covers both encoder and decoder, so using fdk_aac does not add patent violations to FFmpeg.)

Free Software?

Party Classification Note
Debian Non-free [6][7] (fee clause)
Fedora/Red Hat Free but not Allowed [8]: Fedora has since adopted a more defensive posture to patent language, making it not Allowed
FSF Free (but warns about patents) [9]

Afterburner

Afterburner is "a type of analysis by synthesis algorithm which increases the audio quality but also the required processing power." Fraunhofer recommends to always activate this feature.

Audio Object Types

The library supports the following MPEG-2/4 AOTs:

Object Type ID Audio Object Type Description
2 AAC-LC "AAC Profile" MPEG-2 Low-complexity (LC) combined with MPEG-4 Perceptual Noise Substitution (PNS)
5 HE-AAC AAC LC + SBR (Spectral Band Replication)
29 HE-AAC v2 AAC LC + SBR + PS (Parametric Stereo)
23 AAC-LD "Low Delay Profile" used for real-time communication
39 AAC-ELD Enhanced Low Delay
129 MPEG-2 AAC LC
132 MPEG-2 HE-AAC (SBR)
156 MPEG-2 HE-AAC v2 (SBR+PS)

Bitrate Modes

AACENC_BITRATEMODE Mode Stream Bitrate
0 Constant Bitrate (CBR) As specified by AACENC_BITRATE
1-5 Variable Bitrate (VBR) Calculated based on channel layout (See table below)
6 Fixed frame mode.
7 Superframe mode.
8 LD/ELD full bitreservoir for packet based transmission

The bitrate limit for each variable bitrate mode. [10] HE and HEv2 will often end up with actual bitrates far below these limits.

AACENC_BITRATEMODE
(VBR Modes)
Mode Bitrate per channel (LC) AOTs
Mono Stereo a
1 VBR 32 kbps 20 kbps LC, HE, HEv2
2 VBR 40 kbps 32 kbps LC, HE, HEv2
3 VBR 56 kbps 48 kbps LC, HE, HEv2
4 VBR 72 kbps 64 kbps LC
5 VBR 112 kbps 96 kbps LC

a Note that a "stereo" channel is any that is bonded with another channel, as noted with a plus sign in the channel layouts table.

Example Bitrate Calculations

Profile VBR Mode Channel layout Expected stream bitrate
LC 3 L+R 2 "stereo" channels at 48kbps = 96kbps
LC 3 C, L+R 1 "mono" center channel at 56 kbps and 2 "stereo" channels at 48kbps = 152kbps
LC 4 C, L+R, LS+RS, LFE 1 "mono" center channel and 1 mono LFE channel each at 72kbps, and 4 "stereo" channels (2 sets of 2) each at 64kbps = 400kbps

Bandwidth

A spectrogram showing the effect of the FDK AAC low-pass filter.

The default bandwidth (or low-pass filter cutoff) for each bitrate mode will be the minimum of the appropriate value in the tables below or half the sample rate. This can be overridden, but the maximum value is 20000 Hz. [11]

The fdk-aac parameter is AACENC_BANDWIDTH. More information can be found in the official documentation, section 3.1 Bandwidth.

HE-AAC/SBR

The HE-AAC and HE-AACv2 profiles encode audio using AAC-LC at one half the sample rate, relying on Spectral Band Replication (SBR) to attempt reconstruction of the missing higher frequencies. The end result is an apparent full bandwidth transmission (as if no low-pass filter was applied), even though the actual AAC-LC encoded audio is only storing frequencies up to 1/4 the original sample rate.

VBR Modes

AACENC_BITRATEMODE Mono Two or More Channels
1 13050 Hz
2 13050 Hz
3 14260 Hz
4 15500 Hz
5 Full range, no filter

CBR Mode

AOT/Sample Rates Bitrate
per channel
Mono Two or More Channels
LC / Any Below 12kbps 3700 Hz 5000 Hz
12-20 kbps 5000 Hz 6400 Hz
20-28 kbps 6900 Hz 9640 Hz
28-40 kbps 9600 Hz 13050 Hz
40-56 kbps 12060 Hz 14260 Hz
56-72 kbps 13950 Hz 15500 Hz
72-96 kpbs 14200 Hz 16120 Hz
96kbps and above 17000 Hz
...
LD / 44100 Hz 56kbps 11000 Hz 12900 Hz
64kbps 14400 Hz 15500 Hz
...

Sample Format

The FDK library is based on fixed-point math and only supports 16-bit integer PCM input.

Sample Rates

FDK library officially supports sample rates for input of 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000, 64000, 88200, and 96000 Hz.

See Issues/GetInvInt table limit if experiencing crashes with high sample rates and VBR.

Also see Recommended Sampling Rate and Bitrate Combinations.

Channel Layouts

Channels Layout Mode Description
1 C MODE_1 Mono
2 L+R MODE_2 Stereo
3 C, L+R MODE_1_2
4 C, L+R, Rear MODE_1_2_1 fdkaac calls it "C L R Cs"
5 C, L+R, LS+RS MODE_1_2_2
5.1 C, L+R, LS+RS, LFE MODE_1_2_2_1
7.1 C, LC+RC, L+R, LS+RS, LFE MODE_1_2_2_2_1
MODE_7_1_FRONT_CENTER
7.1 (Rear) C, L+R, LS+RS, Lrear+Rrear, LFE MODE_7_1_REAR_SURROUND

The plus sign (+) denotes "stereo" channels.

Issues

GetInvInt table limit

As of FDK version 3.4.12, not all combinations of audio object types, bitrate modes, channel layouts, and sample rates can be used together, due to a limited table of pre-computed values used by the encoder.

For example, using 96kHz stereo input with the AAC-LC audio object type and bitrate mode 5 (VBR 96-112kbps/channel) will result in catastrophic failure: [12]

./libFDK/include/fixpoint_math.h:459: FIXP_DBL GetInvInt(int): Assertion `(intValue > 0) && (intValue < 50)' failed.
Aborted (core dumped)

A recent (August 2014) patch to libfdk-aac fixes most of the previously unsupported combinations [13], and is expected to be included in the next official version of the FDK AAC library.

See Libav/avconv for a workaround.

Recommended Sampling Rate and Bitrate Combinations

This table is from the documentation included in the FDK library source code. (PDF section 2.12 or source code: [14])

The following table provides an overview of recommended encoder configuration parameters which [Fraunhofer] determined by virtue of numerous listening tests.

Audio Object Type Bit Rate Range
[bit/s]
Supported Sampling Rates
[kHz]
Recommended Sampling Rate
[kHz]
Number of
Channels
[29] HE-AAC v2
(AAC LC + SBR + PS)
8000 - 11999 22.05, 24.00 24.00 2
12000 - 17999 32.00 32.00 2
18000 - 39999 32.00, 44.10, 48.00 44.10 2
40000 - 56000 32.00, 44.10, 48.00 48.00 2
[5] HE-AAC
(AAC LC + SBR)
8000 - 11999 22.05, 24.00 24.00 1
12000 - 17999 32.00 32.00 1
18000 - 39999 32.00, 44.10, 48.00 44.10 1
40000 - 56000 32.00, 44.10, 48.00 48.00 1
16000 - 27999 32.00, 44.10, 48.00 32.00 2
28000 - 63999 32.00, 44.10, 48.00 44.10 2
64000 - 128000 32.00, 44.10, 48.00 48.00 2
[5] HE-AAC
(AAC LC + SBR)
64000 - 69999 32.00, 44.10, 48.00 32.00 5, 5.1
70000 - 159999 32.00, 44.10, 48.00 44.10 5, 5.1
160000 - 245999 32.00, 44.10, 48.00 48.00 5
160000 - 265999 32.00, 44.10, 48.00 48.00 5.1
[2] AAC LC 8000 - 15999 11.025, 12.00, 16.00 12.00 1
16000 - 23999 16.00 16.00 1
24000 - 31999 16.00, 22.05, 24.00 24.00 1
32000 - 55999 32.00 32.00 1
56000 - 160000 32.00, 44.10, 48.00 44.10 1
160001 - 288000 48.00 48.00 1
[2] AAC LC 16000 - 23999 11.025, 12.00, 16.00 12.00 2
24000 - 31999 16.00 16.00 2
32000 - 39999 16.00, 22.05, 24.00 22.05 2
40000 - 95999 32.00 32.00 2
96000 - 111999 32.00, 44.10, 48.00 32.00 2
112000 - 320001 32.00, 44.10, 48.00 44.10 2
320002 - 576000 48.00 48.00 2
[2] AAC LC 160000 - 239999 32.00 32.00 5, 5.1
240000 - 279999 32.00, 44.10, 48.00 32.00 5, 5.1
280000 - 800000 32.00, 44.10, 48.00 44.10 5, 5.1

(lib)fdk-aac

Martin Storsjö (as the opencore-amr project) maintains a source code distribution of the Fraunhofer library as fdk-aac. It is distributed in a binary form in Debian (and Debian derivatives like Ubuntu) as the package fdk-aac, which includes the libfdk-aac* and aac-enc binaries.

See Software Versions for latest release information.

Links

  • Source at Github
  • fdk-aac at Debian package tracker. Package includes libfdk-aac* and the aac-enc binary.

aac-enc

fdk-aac includes a very, very basic command-line interface encoding utility, called aac-enc, that can encode to AAC from WAV.

Usage

aac-enc [-r bitrate] [-t aot] [-a afterburner] [-s sbr] [-v vbr] in.wav out.aac
-r <bitrate>
Bitrate in bits per seconds (for CBR). Default is 64000.
-t <aot>
The Audio Object Type. Default is 2 (AAC-LC).
-a <0,1>
Enable Afterburner. 0=Disabled, 1=Enabled (recommended). Default is 1.
-s <-1,0,1>
Spectral Band Replication (ELD AOT only). -1=Use ELD SBR auto configurator (default,recommended), 0=Disabled, 1=Enabled. Default is -1.
-v <0-5>
Bitrate mode. Only 0-5 used. 0=CBR @ value given in -r. Default is 0.

fdkaac

fdkaac is a command-line interface encoding and metadata utility. It is maintained by nu774 and is licensed under the zlib license. It employs libfdk-aac for encoding.

See Software Versions for latest release information.

Examples

# Convert a FLAC file to m4a using fdkaac configured for AAC-LC at about 50kbps/channel (100kbps for stereo).
flac -s -d -c song.flac | fdkaac --ignorelength --profile 2 --bitrate-mode 3 -o song.m4a -

Usage

fdkaac [options] input_file
-p, --profile <n>
The Audio Object Type.
-b, --bitrate <n>
Bitrate in bits per seconds (for CBR)
-m, --bitrate-mode <n>
Bitrate mode. Only 0-5 used. 0=CBR.
-w, --bandwidth <n>
Frequency bandwidth in Hz (AAC LC only)
-a, --afterburner <n>
Enable Afterburner. 0=Disabled, 1=Enabled (recommended). Default is 1.
-L, --lowdelay-sbr <-1,0,1>
Configure SBR activity on AAC ELD
-1 Use ELD SBR auto configurator
0 Disable SBR on ELD (default)
1 Enable SBR on ELD
-s, --sbr-ratio <0,1,2>
Controls activation of downsampled SBR
0 Use lib default (default)
1 Downsampled SBR (default for ELD+SBR)
2 Dual-rate SBR (default for HE-AAC)
-f, --transport-format <n>
Transport format
0 RAW (default, muxed into M4A)
1 ADIF
2 ADTS
6 LATM MCP=1
7 LATM MCP=0
10 LOAS/LATM (LATM within LOAS)
-C, --adts-crc-check
Add CRC protection on ADTS header
-h, --header-period <n>
StreamMuxConfig/PCE repetition period in transport layer
-o <filename>
Output filename
-G, --gapless-mode <n>
Encoder delay signaling for gapless playback
0 iTunSMPB (default)
1 ISO standard (edts + sgpd)
2 Both
--include-sbr-delay
Count SBR decoder delay in encoder delay. This is not iTunes compatible, but is default behavior of FDK library.
-I, --ignorelength
Ignore length of WAV header
-S, --silent
Don't print progress messages
--moov-before-mdat
Place moov box before mdat box on m4a output

Options for raw (headerless) input:

-R, --raw
Treat input as raw (by default WAV is assumed)
--raw-channels <n>
Number of channels (default: 2)
--raw-rate <n>
Sample rate (default: 44100)
--raw-format <spec>
Sample format, default is "S16L". Spec is as follows:
1st char S(igned), U(nsigned), or F(loat)
2nd part bits per channel
Last char L(ittle) or B(ig)
Last char can be omitted, in which case L is assumed. Spec is case insensitive, therefore "u16b" is same as "U16B".
Up to 32-bit integer or 64-bit floating point format is supported as input. The FDK library, however, is implemented based on fixed point math and onlysupports 16-bit integer PCM. Therefore, be wary of clipping. You might want to dither/noise shape beforehand when your input has higher resolution.

Tagging options:

--tag <fcc>
<value>: Set iTunes predefined tag with four char code. See iTunes Metadata.
--tag-from-file <fcc>:<filename>
Same as above, but value is read from file.
--long-tag <name>:<value>
Set arbitrary tag as iTunes custom metadata.
--tag-from-json <filename[?dot_notation]>
Read tags from JSON. By default, tags are assumed to be direct children of the root object(dictionary). Optionally, position of the dictionary that contains tags can be specified with dotted notation.
Option/Usage MP4 Block Modified Comment
--title <string> ©nam
--artist <string> ©ART
--album <string> ©alb
--genre <string> ©gen Appears to always store the string the "user-defined" ©gen even if there is an ID3 genre id that could be used with the gnre block.
--date <string> ©day YYYY[-MM[-DD]] format
--composer <string> ©wrt
--grouping <string> ©grp
--comment <string> ©cmt
--album-artist <string> aART
--track <number[/total]> trkn Block stores both track and totaltracks in one binary value
--disk <number[/total]> disk Block stores both disc and totaldiscs in one binary value
--tempo <n> tmpo Beats per minute, stored as a 16-bit integer

Links

FFmpeg

libfdk-aac can be used with FFmpeg, but requires a custom build of FFmpeg. FFmpeg provides significant documentation for using libfdk_aac in the FFmpeg wiki.

Usage/Examples

CBR mode:

ffmpeg -i <input> -c:a libfdk_aac -b:a 128k <output>

VBR mode:

ffmpeg -i <input> -c:a libfdk_aac -vbr 3 <output>
-afterburner
Enable Afterburner. 0=Disabled, 1=Enabled (recommended). Default is 1.
-profile:a
The Audio Object Type. Value is one of LC, HE-AAC, HE-AACv2, LD, or ELD. Default is LC.
-b:a
CBR bitrate
-vbr
Values 1-5. See Bitrate mode.
--cutoff
The low-pass filter cut-off in Hz. See Bandwidth for default values. FFmpeg maximum value is 20000.

Links

Libav/avconv

libfdk-aac can be used with Libav's avconv, but requires a custom build of avconv with "--enable-libfdk-aac" passed to configure. See Libav AAC encoding.

Usage

CBR mode:

avconv -i <input> -c:a libfdk_aac -b:a <bitrate> -afterburner 1 <output>

VBR mode:

avconv -i <input> -c:a libfdk_aac -flags +qscale -global_quality [1-5] -afterburner 1 <output>
-afterburner
See afterburner.
-global_quality
Values 1-5. See Bitrate mode.

FLAC to M4A example with quirks

Using a FLAC example with 24-bit/96kHz 5.1 channel audio, and embedded album art to demonstrate workarounds for some quirks/bugs. The sample used is from the Diatonis Free Surround Sound Music page. The track used is titled "Rock".

avconv -i diatonis-rock.flac -vn -sample_fmt s16 -ar 48000 -c:a libfdk_aac -flags +qscale -global_quality 5 diatonis-rock.m4a
-global_quality 5
Use VBR Mode 5.
-vn
Means drop all video. The FLAC source has embedded album art that can't be handled by avconv in this case. Libav apparently doesn't know how to embed cover art in M4A. It tries to use it as an MP4 video stream. Using -c:v mjpeg, as can be done with MP3, doesn't work either. See NeroAacTag for a tool that can easily add M4A album art.
-sample_fmt s16 -ar 48000
The FLAC source's 96kHz sample rate combined with VBR mode 5 triggers the GetInvInt table limit bug in libfdk_aac 0.1.3 and earlier. These options resample the audio before sending it to the FDK encoder, to avoid the crash.

Links

References

  1. NOTICE file, fdkaac
  2. ffmpeg -license command output

Links