[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
2.3
Barry Warsaw
barry@python.org
16 Apr 2003 12:52:06 -0400
On Sat, 2003-04-12 at 07:43, Martin v. Löwis wrote:
> More or less, yes. Now, what happens if you pot "real" non-ASCII
> (i.e. bytes above 127) into the message id, like so:
But I don't think you'd ever want to do that. In fact, I think in
general you're probably talking about ascii msgids or utf-8 encoded
Unicode msgids. I'm not sure what else would make sense.
> msgfmt will still accept that, but msgunfmt will complain:
Didn't even know about msgunfmt. :)
> msgunfmt: warning: The following msgid contains non-ASCII characters.
> This will cause problems to translators who use a
> character encoding different from yours. Consider
> using a pure ASCII msgid instead.
>
> If you think about this, this is really bad: If you mean to apply the
> charset= to both msgid and msgstr, then translators using a different
> charset from yours are in big trouble.
Right, but see above. E.g. if your string literals are all Spanish and
you want a Turkish translation, then utf-8 is the only common encoding
you could possibly use in a .po file, right?
> They are faced with three problems:
> 1. They don't know what the charset of the msgids is. The PO files do
> have a charset declaration, the POT files typically don't.
Yep, although it would be easy for the extractor to add a charset=utf-8
to the pot file.
> 2. They need to convert the msgids from the POT encoding to their
> native encoding. There are no tools available to support that readily;
> tools like iconv might correctly convert the msgids, but won't update
> the charset= in the POT file (if the charset was filled out).
> 3. By converting the msgids, they are also changing them. That means
> the msgids are not really suitable as keys anymore.
Is this still a problem for when charset=utf-8?
-Barry