Problems with gettext and msgfmt

JKPeck jkpeck at gmail.com
Wed Dec 16 05:12:51 CET 2009


I'm using Python 2.6 on Windows and having trouble with the charset in
gettext.  It seems to be so broken that I must be missing something.

When I run msgfmt.py, as far as I can see it writes no charset
information into the mo file.  The actual po files are in utf-8 in
this case and have a charset declaration.

Then when ,_parse in gettext loads the messages, it does no conversion
to Unicode, because it has no charset information.  So the message
dictionary is actually in utf-8 despite the comment in the code
# Note: we unconditionally convert both msgids and msgstrs to
            # Unicode using the character encoding specified in the
charset
            # parameter of the Content-Type header.

Then ugettext tries to just return the translated message, which is
not in Unicode, or to convert to Unicode, which fails because the
unicode call is not specifying any encoding.

The _parse code seems to expect to produce a Unicode translation
dictionary, and gettext expects to encode Unicode into the current
code page, but the message dictionary never gets mapped to Unicode in
the first place.

What I want is simply to use utf-8 po files and get translations in
Unicode.

TIA for any suggestions.

-Jon Peck



More information about the Python-list mailing list