[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3

Martin v. Löwis martin@v.loewis.de
12 Apr 2003 13:43:28 +0200

Barry Warsaw <barry@python.org> writes:

> I used standard msgfmt to turn that into a .mo file.  Then created a
> GNUTranslation(fp, coerce=True) and called
> >>> t.ugettext(u'ab\xde')
> u'\xa4yz'
> This is what I should expect, right? ;)

More or less, yes. Now, what happens if you pot "real" non-ASCII
(i.e. bytes above 127) into the message id, like so:

msgid "abö"
msgstr "\xc2\xa4yz"

msgfmt will still accept that, but msgunfmt will complain:

msgunfmt: warning: The following msgid contains non-ASCII characters.
                   This will cause problems to translators who use a
                   character encoding different from yours. Consider
                   using a pure ASCII msgid instead.

If you think about this, this is really bad: If you mean to apply the
charset= to both msgid and msgstr, then translators using a different
charset from yours are in big trouble.

They are faced with three problems:
1. They don't know what the charset of the msgids is. The PO files do
   have a charset declaration, the POT files typically don't.
2. They need to convert the msgids from the POT encoding to their
   native encoding. There are no tools available to support that readily;
   tools like iconv might correctly convert the msgids, but won't update
   the charset= in the POT file (if the charset was filled out).
3. By converting the msgids, they are also changing them. That means
   the msgids are not really suitable as keys anymore.