[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3

"Martin v. Löwis" martin@v.loewis.de
Thu, 17 Apr 2003 00:07:15 +0200


Barry Warsaw wrote:

> So why isn't the English/US-ASCII bias for msgids considered a liability
> for gettext?  Do non-English programmers not want to use native literals
> in their source code?

Using English for msgids is about the only way to get translation. 
Finding a Turkish speaker who can translate from Spanish is 
*significantly* more difficult than starting from English; if you were 
starting from, say, Chinese, and going to Hebrew might just be impossible.

So any programmer who seriously wants to have his software translated 
will put English texts into the source code. Non-English literals are 
only used if l10n is not an issue.

> If we adhere to this limitation instead of extending gettext then it
> seems like Zope will be forced to use something else, and that seems
> like a waste.  

It's not a limitation of gettext, but a usage guideline: gettext can map 
arbitrary byte strings to arbitrary other byte strings.

> BTW, I believe that if all your msgids /are/ us-ascii, you should be
> able to ignore this change and have it works backwards compatibly.

"This" change being addition of the "coerce" argument? If you think
you will need it, we can leave it in.

>>If the msgids are UTF-8, with non-ASCII characters C-escaped,
>>translators will *still* put non-UTF-8 encodings into the catalogs.
>>This will then be a problem: The catalog encoding won't be UTF-8,
>>and you can't process the msgids.
> 
> 
> Isn't this just another validation step to run on the .po files?  There
> are already several ways translators can (and do!) make mistakes, so we
> already have to validate the files anyway.

I'm not sure how exactly a validation step would be executed. Would that
step simply verify that the encoding of a catalog is UTF-8? That 
validation step would fail for catalogs that legally use other charsets.

Regards,
Martin