[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3

16 Apr 2003 15:36:08 -0400

On Wed, 2003-04-16 at 15:20, Martin v. Löwis wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > Right, but see above.  E.g. if your string literals are all Spanish and
> > you want a Turkish translation, then utf-8 is the only common encoding
> > you could possibly use in a .po file, right?
> 
> That's why your string literals should never be all Spanish. If you
> have Spanish string literals and use escape codes in the msgid,
> reading the Spanish msgid becomes difficult, anyway.

So why isn't the English/US-ASCII bias for msgids considered a liability
for gettext?  Do non-English programmers not want to use native literals
in their source code?

If we adhere to this limitation instead of extending gettext then it
seems like Zope will be forced to use something else, and that seems
like a waste.  Its msgids come from sources other than program source
code and such sources may indeed be written in non-English.  It seems
like gettext is so close and all the machinery is almost there, that
this small enhancement should be harmless and helpful.

BTW, I believe that if all your msgids /are/ us-ascii, you should be
able to ignore this change and have it works backwards compatibly.

Also, this change ought to visibly only affect .ugettext() which isn't
part of the traditional gettext API anyway.

> > > 3. By converting the msgids, they are also changing them. That means
> > >    the msgids are not really suitable as keys anymore.
> > 
> > Is this still a problem for when charset=utf-8?
> 
> If the msgids are UTF-8, with non-ASCII characters C-escaped,
> translators will *still* put non-UTF-8 encodings into the catalogs.
> This will then be a problem: The catalog encoding won't be UTF-8,
> and you can't process the msgids.

Isn't this just another validation step to run on the .po files?  There
are already several ways translators can (and do!) make mistakes, so we
already have to validate the files anyway.

-Barry