[I18n-sig] Re: gettext in the standard library

Martin von Loewis loewis@informatik.hu-berlin.de
Mon, 4 Sep 2000 15:42:57 +0200 (MET DST)

> I do not see, nor understand, why we should have special API provisions
> for Unicode.  I thought a great effort has been put in Unicode support
> design so it would be as transparent as possible.  Isn't making Unicode
> explicit going against this spirit?

In Python 2, unicode strings are a separate type from byte
strings. The catalog objects will have two methods, one for retrieving
a byte string, as it appears in the mo file, and one for retrieving a
unicode string. It is then the application developer's choice whether
his application can deal with Unicode messages on output or not.

The core issue is that catalogs only map byte strings to byte strings.

> Should not "_(...)" return either a simple string or a Unicode string,
> depending solely on the goal language?  Would not all the rest just fall
> out naturally from this choice?  What is that problem that I do not
> see?

You can't be certain that the encoding of the catalog msgstrs is the
same as the one of the user. For example, the catalog may use KOI-8,
whereas the user's terminals are all in UTF-8. So you have know the
catalog's encoding. This, in turn, is only available of the catalog
follows the convention of containing a valid Content-Type field in the
translation of the empty string. Or, the Python installation may not
have the converter from the .mo file's encoding to Unicode.

Also, how would goal language determine whether Unicode is a better
representation for messages than some MBCS?

> Also, what means "GNUTranslations" above?  What is especially "GNU" in
> the act of translating?  Should not we just avoid any "GNU"
> references?

The format of the catalog files is defined by GNU gettext.