Problems with gettext and msgfmt

Wed Dec 16 09:21:29 EST 2009

On Dec 15, 9:12 pm, JKPeck <jkp... at gmail.com> wrote:
> I'm using Python 2.6 on Windows and having trouble with the charset in
> gettext.  It seems to be so broken that I must be missing something.
>
> When I run msgfmt.py, as far as I can see it writes no charset
> information into the mo file.  The actual po files are in utf-8 in
> this case and have a charset declaration.
>
> Then when ,_parse in gettext loads the messages, it does no conversion
> to Unicode, because it has no charset information.  So the message
> dictionary is actually in utf-8 despite the comment in the code
> # Note: we unconditionally convert both msgids and msgstrs to
>             # Unicode using the character encoding specified in the
> charset
>             # parameter of the Content-Type header.
>
> Then ugettext tries to just return the translated message, which is
> not in Unicode, or to convert to Unicode, which fails because the
> unicode call is not specifying any encoding.
>
> The _parse code seems to expect to produce a Unicode translation
> dictionary, and gettext expects to encode Unicode into the current
> code page, but the message dictionary never gets mapped to Unicode in
> the first place.
>
> What I want is simply to use utf-8 po files and get translations in
> Unicode.
>
> TIA for any suggestions.
>
> -Jon Peck

Never mind.  I figured this out.  The problem is that a line such as
_("")
in the source that is scanned causes all the meta information to be
lost in the mo file.  Once I changed that code, I get the expected
result.