Foobar2000:Components 0.9/Chacon (foo chacon)

From Hydrogenaudio Knowledgebase
Revision as of 17:29, 24 September 2022 by Thorna (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Chacon (foo_chacon)

Chacon (an acronym for charset convertor) is a simple tool for fixing tags by converting them between different character sets.

The offered functionality is essentially similar to what the "Override charset" option in foo_infobox did, though it's accessed directly from the context menu and for any number of tracks at once.

The component can be generally used to fix ID3v1 tags or cue sheets saved in a code page different from that of your system, as well as to perform some more complex restoration of files mangled by programs which write incompatible/incorrect tags.


Implementation details

foobar2000 reads Unicode tags as Unicode and multibyte tags as ISO-8859-1 or local Windows codepage, then converts this to UTF-8 for internal usage.

Multibyte tags saved on a system with different locale have been converted to UTF-8 incorrectly, so first it is necessary to get them back to the state they are stored in the file (it is not possible to get the raw string from the tag reader directly). That's what the first code page selector in Chacon does. Most of the time, "<system code page>" is the right choice - it will simply do the same conversion as the fb2k's reader did, only backwards (raw tag ―[system to UTF-8]→ UTF-8 ―[UTF-8 to system]→ raw tag). But for differently broken files (e.g. resaved as Unicode while already broken), this might need to be set to "<disable>" or an explicit character set.
Note: The "system code page" to UTF-8 conversion is not guaranteed to be 100% reversible and therefore the conversion may fail for some characters in weird cases.

Then, when the tag is in its (presumably) original form again, a second conversion is performed, according to the second code page selector. You should choose the character set of the original data, which is often the system code page of the system where the file was originally saved.


Links

Download page
Discussion thread


Tutorial


For the start, let's have some songs with obviously broken tags in the playlist. I have two tracks here, both with the same original metadata, but with differently mangled artist name.

To invoke Chacon, select one and choose "Tagging > Fix Metadata Charset..." from the context (right-click) menu.

Naturally, you can select multiple items at once too, and from any other source (a whole playlist, album list node, result of search, etc.). It makes the whole process faster and safer.


When the dialog appears, it might not be immediately obvious what to do here - that's because the current, default conversion is invalid and doesn't return anything meaningful at all.

This file is supposed to have its artist name in Japanese, so let's select some Japanese code page.


There we go, the title is fine now. Shift-JIS is a very common code page for Japanese, but it could have been many other character sets.

Clicking on the drop-down list to try them one by one would be very slow, better use the arrow up and down keys when it has focus - you can go through the whole list fast and easy.

Now let's try the second file.


As you can see, in this case, choosing only the original charset doesn't yield a proper result.

This file was "damaged" on a computer running with different system locale and therefore we have to choose the correct preconversion first.


OK, seems like it was Latin 1.

But it can happen that it is not immediately obvious if the converted titles are right, mostly when they are in a foreign script you don't understand. Sometimes the correctness is more apparent with a whole album loaded (and you see no question marks or missing parts in any of the titles, not just one), but still, errors can occur. It often helps to search for the output text on the Internet and see if any pages are found which are related to the specified artist, contain track listings of the album or at least confirm you got a valid word.

To aid checking the results like this, it is possible to conveniently copy the current output to clipboard using the context menu - either only the contents of the current (clicked) column or the whole line. If you select more rows, the tags for all of them are copied.


After clicking on "Apply", the tags are resaved using the proper Unicode encoding and should never bother you anymore. As long as you won't use some broken application, that is.

The selected conversion settings will be remembered, so if you regularly need to fix files garbled the same way, it will be just a few clicks.

Good luck!