[I18n-sig] bugs in gettext.py plural handling
Bruno Haible
bruno@clisp.org
Mon, 24 Feb 2003 14:31:42 +0100 (CET)
Hi,
Testing GNU gettext's integration test with Python 2.3a2, I see that
there are several bugs relating to plural forms and the ngettext functi=
on.
1)
$ python
import gettext
germanic =3D gettext.c2py('!(n =3D=3D 1)')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
110, in c2py
stack[-1] +=3D '(%s)' % s
IndexError: list index out of range
The ! operator is treated incorrectly if not followed by a space.
Here is a fix.
*** gettext.py.bak=092003-02-22 02:28:17.000000000 +0100
--- gettext.py=092003-02-22 21:37:33.000000000 +0100
***************
*** 88,95 ****
plural =3D plural.replace('&&', ' and ')
plural =3D plural.replace('||', ' or ')
=20
! expr =3D re.compile(r'\![^=3D]')
! plural =3D expr.sub(' not ', plural)
=20
# Regular expression and replacement function used to transform
# "a?b:c" to "test(a,b,c)".
--- 88,95 ----
plural =3D plural.replace('&&', ' and ')
plural =3D plural.replace('||', ' or ')
=20
! expr =3D re.compile(r'\!([^=3D])')
! plural =3D expr.sub(' not \\1', plural)
=20
# Regular expression and replacement function used to transform
# "a?b:c" to "test(a,b,c)".
2) Unbalanced parentheses in a plural expression don't give an error
'unbalanced parenthesis in plural form'.
Example:
$ python
import gettext
germanic =3D gettext.c2py('n =3D)=3D 1')
Instead we get an weird error message
tokenize.TokenError: ('EOF in multi-line statement', (2, 0))
Furthermore even if this error were avoided, we would get
IndexError: list index out of range
Here is a fix for the second half of this bug. I don't know Python
enough to fix the first half as well.
*** gettext.py.bak=092003-02-22 02:28:17.000000000 +0100
--- gettext.py=092003-02-22 21:37:33.000000000 +0100
***************
*** 104,110 ****
if c =3D=3D '(':
stack.append('')
elif c =3D=3D ')':
! if len(stack) =3D=3D 0:
raise ValueError, 'unbalanced parenthesis in plural f=
orm'
s =3D expr.sub(repl, stack.pop())
stack[-1] +=3D '(%s)' % s
--- 104,110 ----
if c =3D=3D '(':
stack.append('')
elif c =3D=3D ')':
! if len(stack) =3D=3D 1:
raise ValueError, 'unbalanced parenthesis in plural f=
orm'
s =3D expr.sub(repl, stack.pop())
stack[-1] +=3D '(%s)' % s
3) Here's my test code (in ISO-8859-1):
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D prog.py=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
import sys
import gettext
n =3D int(sys.argv[1])
gettext.textdomain('prog')
gettext.bindtextdomain('prog', '.')
print gettext.gettext("'Your command, please?', asked the waiter.")
print gettext.ngettext("a piece of cake","%(count)d pieces of cake",n) =
\
% { 'count': n }
print gettext.gettext("%(oldCurrency)s is replaced by %(newCurrency)s."=
) \
% { 'oldCurrency': "FF", 'newCurrency' : "EUR" }
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D f=
r.po =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
msgid ""
msgstr ""
"Content-Type: text/plain; charset=3DISO-8859-1\n"
"Plural-Forms: nplurals=3D2; plural=3D(n > 1);\n"
msgid "'Your command, please?', asked the waiter."
msgstr "=ABVotre commande, s'il vous plait=BB, dit le gar=E7on."
# Les gateaux allemands sont les meilleurs du monde.
#, python-format
msgid "a piece of cake"
msgid_plural "%(count)d pieces of cake"
msgstr[0] "un morceau de gateau"
msgstr[1] "%(count)d morceaux de gateau"
# Reverse the arguments.
#, python-format
msgid "%(oldCurrency)s is replaced by %(newCurrency)s."
msgstr "%(newCurrency)s remplace %(oldCurrency)s."
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
$ mkdir -p fr/LC_MESSAGES
$ msgfmt -o fr/LC_MESSAGES/prog.mo fr.po
$ LANGUAGE=3D LC_ALL=3Dfr_FR python prog.py 2
=ABVotre commande, s'il vous plait=BB, dit le gar=E7on.
Traceback (most recent call last):
File "prog.py", line 10, in ?
print gettext.ngettext("a piece of cake","%(count)d pieces of cake"=
,n) \
File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
445, in ngettext
return dngettext(_current_domain, msgid1, msgid2, n)
File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
437, in dngettext
return t.ngettext(msgid1, msgid2, n)
File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
294, in ngettext
return self._catalog[(msgid1, self.plural(n))]
AttributeError: GNUTranslations instance has no attribute 'plural'
Why does it have no 'plural' attribute?
* Testing that the header entry starts with 'Project-Id-Version:' is
not appropriate because it excludes valid header entries. The gettext
tools may remove or move this line in future versions.
* libintl and msgfmt assume a fallback of "n !=3D 1" if no Plural-Forms=
:
entry is provided. In the same way, self.plural should use "n !=3D 1" a=
s a
fallback.
Here is the fix for both.
*** gettext.py.bak=092003-02-22 02:28:17.000000000 +0100
--- gettext.py=092003-02-22 21:37:33.000000000 +0100
***************
*** 114,119 ****
--- 114,121 ----
=20
return eval('lambda n: int(%s)' % plural)
=20
+ _germanic_plural =3D lambda n: int(n !=3D 1)
+=20
=20
=20
def _expand_lang(locale):
***************
*** 225,230 ****
--- 227,233 ----
# Parse the .mo file header, which consists of 5 little endia=
n 32
# bit words.
self._catalog =3D catalog =3D {}
+ self.plural =3D _germanic_plural
buf =3D fp.read()
buflen =3D len(buf)
# Are we big endian or little endian?
***************
*** 258,264 ****
else:
raise IOError(0, 'File is corrupt', filename)
# See if we're looking at GNU .mo conventions for metadat=
a
! if mlen =3D=3D 0 and tmsg.lower().startswith('project-id-=
version:'):
# Catalog description
for item in tmsg.split('\n'):
item =3D item.strip()
--- 261,267 ----
else:
raise IOError(0, 'File is corrupt', filename)
# See if we're looking at GNU .mo conventions for metadat=
a
! if mlen =3D=3D 0:
# Catalog description
for item in tmsg.split('\n'):
item =3D item.strip()
4) Btw, I have to correct a misimpression. It was claimed in
http://mail.python.org/pipermail/i18n-sig/2002-November/001514.html
that GNU xgettext 0.11.5 doesn't support ngettext in Python. But it doe=
s
if you add the command line options "-kgettext -kngettext:1,2". The rea=
son
is that when xgettext 0.11.5 was released, Python didn't have the ngett=
ext
function, and noone told me that it would.
So for example,
$ xgettext -kgettext -kngettext:1,2 -o - prog.py
produces the .pot file for prog.py above.
Bruno