[I18n-sig] Changes to gettext.py for Python 2.3
Martin v. Löwis
martin@v.loewis.de
12 Apr 2003 13:43:28 +0200
Barry Warsaw <barry@python.org> writes:
> I used standard msgfmt to turn that into a .mo file. Then created a
> GNUTranslation(fp, coerce=True) and called
>
> >>> t.ugettext(u'ab\xde')
> u'\xa4yz'
>
> This is what I should expect, right? ;)
More or less, yes. Now, what happens if you pot "real" non-ASCII
(i.e. bytes above 127) into the message id, like so:
msgid "abö"
msgstr "\xc2\xa4yz"
msgfmt will still accept that, but msgunfmt will complain:
msgunfmt: warning: The following msgid contains non-ASCII characters.
This will cause problems to translators who use a
character encoding different from yours. Consider
using a pure ASCII msgid instead.
If you think about this, this is really bad: If you mean to apply the
charset= to both msgid and msgstr, then translators using a different
charset from yours are in big trouble.
They are faced with three problems:
1. They don't know what the charset of the msgids is. The PO files do
have a charset declaration, the POT files typically don't.
2. They need to convert the msgids from the POT encoding to their
native encoding. There are no tools available to support that readily;
tools like iconv might correctly convert the msgids, but won't update
the charset= in the POT file (if the charset was filled out).
3. By converting the msgids, they are also changing them. That means
the msgids are not really suitable as keys anymore.
Regards,
Martin