[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3

"Martin v. Löwis" martin@v.loewis.de
Fri, 11 Apr 2003 21:54:50 +0200


Barry Warsaw wrote:

> - Set the default charset to iso-8859-1.  It used to be None, which
> would cause problems with .ugettext() if the file had no charset
> parameter.  Arguably, the po/mo file would be broken, but I still think
> iso-8859-1 is a reasonable default.

I'm -1 here. Why do you think it is a reasonable default?

Errors should never pass silently.
Unless explicitly silenced.

While iso-8859-1 might be a reasonable default in other application
domains, in the context of non-English text (which it typically is),
assuming Latin-1 is bound to create mojibake.

If your application can accept creating mojibake, I suggest a method
setdefaultencoding on the catalog, which has no effect if an encoding
was found in the catalog.

> - Add a "coerce" default argument to GNUTranslations's constructor.  The
> reason for this is that in Zope, we want all msgids and msgstrs to be
> Unicode.  For the latter, we could use .ugettext() but there isn't
> currently a mechanism for Unicode-ifying msgids.

Could you please in what context this is needed? msgids are ASCII, and
you can pass a Unicode string to ugettext just fine.

> The plan then is that the charset parameter specifies the encoding for
> both the msgids and msgstrs, and both are decoded to Unicode when read. 
> For example, we might encode po files with utf-8. I think the GNU
> gettext tools don't care.

They complain loudly if they find bytes > 127 in the msgid.

> Since this could potentially break code [*] that wants to use the
> encoded interface .gettext(), the constructor flag is added, defaulting
> to False.  Most code I suspect will want to set this to True and use
> .ugettext().

To avoid breakage, you could define ugettext as

   def ugettext(self, message):
       if isinstance(message, unicode):
          tmsg = self._catalog.get(message.encode(self._charset))
          if tmsg is None:
             return message
       else:
          tmsg = self._catalog.get(message, message)
       return unicode(tmsg, self._charset)

> - A few other minor changes from the Zope project, including asserting
> that a zero-length msgid must have a Project-ID-Version header for it to
> be counted as the metadata record.

That test was there, and removed on request of Bruno Haible, the GNU
gettext maintainer, as he points out that Project-ID-Version is not
mandatory for the metadata (see Patch #700839).

Regards,
Martin