[I18n-sig] Re: gettext in the standard library

François Pinard pinard@iro.umontreal.ca
04 Sep 2000 13:49:33 -0400

[Martin von Loewis]

> > I do not see, nor understand, why we should have special API provisions
> > for Unicode.  I thought a great effort has been put in Unicode support
> > design so it would be as transparent as possible.  Isn't making Unicode
> > explicit going against this spirit?

> In Python 2, unicode strings are a separate type from byte strings.
> The catalog objects will have two methods, one for retrieving a byte
> string, as it appears in the mo file, and one for retrieving a unicode
> string.  It is then the application developer's choice whether his
> application can deal with Unicode messages on output or not.

You are merely re-stating that there is a special API for Unicode, here.
I got this already! :-).  My question is about why it is necessary.

> You can't be certain that the encoding of the catalog msgstrs is the
> same as the one of the user.  For example, the catalog may use KOI-8,
> whereas the user's terminals are all in UTF-8.  So you have know the
> catalog's encoding.

Yes, it is described in the PO file header (the translation of the empty
string).  The idea is to convert KOI-8 (or whatever) while retrieving
the translation.  Most of the time, the conversion will be to Unicode.
In some very rare cases, like for Netherlands, ASCII is sufficient.
This all can be done automatically, I do not see why we need two APIs.

> the Python installation may not have the converter from the .mo file's
> encoding to Unicode.

I thought Python 2.0 was to come with a comprehensive set of conversion
routines for doing such things.  If we ever find that one is missing,
we might try to add it, shouldn't we?

> Also, how would goal language determine whether Unicode is a better
> representation for messages than some MBCS?

Oh, no doubt that this may yield to hot debates.  I thought that Python was
trying to give a special treat to Unicode.  You might remember, I do not
know, that I tried to warn people that Unicode is not the end of everything.
I guess you are saying the same thing, here. :-)

For translation purposes, I thought Python was to produce either ASCII
or UTF-8 rather automatically on output.  It is likely to produce a mix,
as the original strings are written in ASCII most of times, which do not
get all translated.  If something else is needed on output, I thought the
intent was to override UTF-8 as an output encoding, yet still use Unicode
internally, instead of any MBCS, taking advantage of all the magic Python
2.0 will have in that respect.  Otherwise, you have to make your Python
script aware of those coding a lot more, internationalisation becomes much
more intrusive in your sources, while we wanted it to be as light weight
as possible.

> > Also, what means "GNUTranslations" above?  What is especially "GNU" in
> > the act of translating?  Should not we just avoid any "GNU"
> > references?

> The format of the catalog files is defined by GNU gettext.

Let's avoid "GNU" in the terminology, if we avoid the GPL.  They usually
go together! :-) And besides, I think we should not overly insist in the
documentation, nor in the API, on the fact that a particular `gettext'
is used underneath.

François Pinard   http://www.iro.umontreal.ca/~pinard