[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3

Barry Warsaw barry@python.org
16 Apr 2003 12:52:06 -0400


On Sat, 2003-04-12 at 07:43, Martin v. Löwis wrote:

> More or less, yes. Now, what happens if you pot "real" non-ASCII
> (i.e. bytes above 127) into the message id, like so:

But I don't think you'd ever want to do that.  In fact, I think in
general you're probably talking about ascii msgids or utf-8 encoded
Unicode msgids.  I'm not sure what else would make sense.

> msgfmt will still accept that, but msgunfmt will complain:

Didn't even know about msgunfmt. :)

> msgunfmt: warning: The following msgid contains non-ASCII characters.
>                    This will cause problems to translators who use a
>                    character encoding different from yours. Consider
>                    using a pure ASCII msgid instead.
> 
> If you think about this, this is really bad: If you mean to apply the
> charset= to both msgid and msgstr, then translators using a different
> charset from yours are in big trouble.

Right, but see above.  E.g. if your string literals are all Spanish and
you want a Turkish translation, then utf-8 is the only common encoding
you could possibly use in a .po file, right?

> They are faced with three problems:
> 1. They don't know what the charset of the msgids is. The PO files do
>    have a charset declaration, the POT files typically don't.

Yep, although it would be easy for the extractor to add a charset=utf-8
to the pot file.

> 2. They need to convert the msgids from the POT encoding to their
>    native encoding. There are no tools available to support that readily;
>    tools like iconv might correctly convert the msgids, but won't update
>    the charset= in the POT file (if the charset was filled out).
> 3. By converting the msgids, they are also changing them. That means
>    the msgids are not really suitable as keys anymore.

Is this still a problem for when charset=utf-8?

-Barry