[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
11 Apr 2003 16:26:59 -0400
On Fri, 2003-04-11 at 15:54, "Martin v. L÷wis" wrote:
> Barry Warsaw wrote:
> > - Set the default charset to iso-8859-1. It used to be None, which
> > would cause problems with .ugettext() if the file had no charset
> > parameter. Arguably, the po/mo file would be broken, but I still think
> > iso-8859-1 is a reasonable default.
> I'm -1 here. Why do you think it is a reasonable default?
> Errors should never pass silently.
> Unless explicitly silenced.
> While iso-8859-1 might be a reasonable default in other application
> domains, in the context of non-English text (which it typically is),
> assuming Latin-1 is bound to create mojibake.
Okay, never mind, I'll back this one out. The problem was caused by my
other patch to unicode-ify on read (see below) without first having a
charset. I have a different fix for this.
> > - Add a "coerce" default argument to GNUTranslations's constructor. The
> > reason for this is that in Zope, we want all msgids and msgstrs to be
> > Unicode. For the latter, we could use .ugettext() but there isn't
> > currently a mechanism for Unicode-ifying msgids.
> Could you please in what context this is needed? msgids are ASCII, and
> you can pass a Unicode string to ugettext just fine.
In Zope, all strings are Unicode and the catalog may include messages
that are extracted from places other than Python source code, e.g.
XML-based files. Message ids can contain non-ASCII characters if they
are written by a non-English coder. I think in that case, we'd want to
do something like encode the strings possibly with utf-8 for the .po/.mo
files, but we want them decoded in time to look the Unicode strings up
in the catalog.
Similarly, what happens if a non-English coder writes an i18n'd Python
module with native strings, possibly using a Python 2.3 coding cookie.
We'd want their message ids to be extracted into the .mo/.po files,
> > The plan then is that the charset parameter specifies the encoding for
> > both the msgids and msgstrs, and both are decoded to Unicode when read.
> > For example, we might encode po files with utf-8. I think the GNU
> > gettext tools don't care.
> They complain loudly if they find bytes > 127 in the msgid.
Really? Ok, I'm still confused because I tried the following example:
I wrote a .mo file (charset=utf-8) with the following record:
I used standard msgfmt to turn that into a .mo file. Then created a
GNUTranslation(fp, coerce=True) and called
This is what I should expect, right? ;)
> > - A few other minor changes from the Zope project, including asserting
> > that a zero-length msgid must have a Project-ID-Version header for it to
> > be counted as the metadata record.
> That test was there, and removed on request of Bruno Haible, the GNU
> gettext maintainer, as he points out that Project-ID-Version is not
> mandatory for the metadata (see Patch #700839).
Ah, I read the diff backwards in this case. I'll back this one out too.