[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
"Martin v. Löwis"
martin@v.loewis.de
Fri, 11 Apr 2003 21:54:50 +0200
Barry Warsaw wrote:
> - Set the default charset to iso-8859-1. It used to be None, which
> would cause problems with .ugettext() if the file had no charset
> parameter. Arguably, the po/mo file would be broken, but I still think
> iso-8859-1 is a reasonable default.
I'm -1 here. Why do you think it is a reasonable default?
Errors should never pass silently.
Unless explicitly silenced.
While iso-8859-1 might be a reasonable default in other application
domains, in the context of non-English text (which it typically is),
assuming Latin-1 is bound to create mojibake.
If your application can accept creating mojibake, I suggest a method
setdefaultencoding on the catalog, which has no effect if an encoding
was found in the catalog.
> - Add a "coerce" default argument to GNUTranslations's constructor. The
> reason for this is that in Zope, we want all msgids and msgstrs to be
> Unicode. For the latter, we could use .ugettext() but there isn't
> currently a mechanism for Unicode-ifying msgids.
Could you please in what context this is needed? msgids are ASCII, and
you can pass a Unicode string to ugettext just fine.
> The plan then is that the charset parameter specifies the encoding for
> both the msgids and msgstrs, and both are decoded to Unicode when read.
> For example, we might encode po files with utf-8. I think the GNU
> gettext tools don't care.
They complain loudly if they find bytes > 127 in the msgid.
> Since this could potentially break code [*] that wants to use the
> encoded interface .gettext(), the constructor flag is added, defaulting
> to False. Most code I suspect will want to set this to True and use
> .ugettext().
To avoid breakage, you could define ugettext as
def ugettext(self, message):
if isinstance(message, unicode):
tmsg = self._catalog.get(message.encode(self._charset))
if tmsg is None:
return message
else:
tmsg = self._catalog.get(message, message)
return unicode(tmsg, self._charset)
> - A few other minor changes from the Zope project, including asserting
> that a zero-length msgid must have a Project-ID-Version header for it to
> be counted as the metadata record.
That test was there, and removed on request of Bruno Haible, the GNU
gettext maintainer, as he points out that Project-ID-Version is not
mandatory for the metadata (see Patch #700839).
Regards,
Martin