[I18n-sig] bugs in gettext.py plural handling

Mon, 24 Feb 2003 14:31:42 +0100 (CET)

Hi,

Testing GNU gettext's integration test with Python 2.3a2, I see that
there are several bugs relating to plural forms and the ngettext functi=
on.

1)

$ python
import gettext
germanic =3D gettext.c2py('!(n =3D=3D 1)')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
 110, in c2py
    stack[-1] +=3D '(%s)' % s
IndexError: list index out of range

The ! operator is treated incorrectly if not followed by a space.

Here is a fix.

*** gettext.py.bak=092003-02-22 02:28:17.000000000 +0100
--- gettext.py=092003-02-22 21:37:33.000000000 +0100
***************
*** 88,95 ****
      plural =3D plural.replace('&&', ' and ')
      plural =3D plural.replace('||', ' or ')
 =20
!     expr =3D re.compile(r'\![^=3D]')
!     plural =3D expr.sub(' not ', plural)
 =20
      # Regular expression and replacement function used to transform
      # "a?b:c" to "test(a,b,c)".
--- 88,95 ----
      plural =3D plural.replace('&&', ' and ')
      plural =3D plural.replace('||', ' or ')
 =20
!     expr =3D re.compile(r'\!([^=3D])')
!     plural =3D expr.sub(' not \\1', plural)
 =20
      # Regular expression and replacement function used to transform
      # "a?b:c" to "test(a,b,c)".

2) Unbalanced parentheses in a plural expression don't give an error
'unbalanced parenthesis in plural form'.

Example:
$ python
import gettext
germanic =3D gettext.c2py('n =3D)=3D 1')

Instead we get an weird error message

tokenize.TokenError: ('EOF in multi-line statement', (2, 0))

Furthermore even if this error were avoided, we would get

IndexError: list index out of range

Here is a fix for the second half of this bug. I don't know Python
enough to fix the first half as well.

*** gettext.py.bak=092003-02-22 02:28:17.000000000 +0100
--- gettext.py=092003-02-22 21:37:33.000000000 +0100
***************
*** 104,110 ****
          if c =3D=3D '(':
              stack.append('')
          elif c =3D=3D ')':
!             if len(stack) =3D=3D 0:
                  raise ValueError, 'unbalanced parenthesis in plural f=
orm'
              s =3D expr.sub(repl, stack.pop())
              stack[-1] +=3D '(%s)' % s
--- 104,110 ----
          if c =3D=3D '(':
              stack.append('')
          elif c =3D=3D ')':
!             if len(stack) =3D=3D 1:
                  raise ValueError, 'unbalanced parenthesis in plural f=
orm'
              s =3D expr.sub(repl, stack.pop())
              stack[-1] +=3D '(%s)' % s

3) Here's my test code (in ISO-8859-1):

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D prog.py=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
import sys
import gettext

n =3D int(sys.argv[1])

gettext.textdomain('prog')
gettext.bindtextdomain('prog', '.')

print gettext.gettext("'Your command, please?', asked the waiter.")
print gettext.ngettext("a piece of cake","%(count)d pieces of cake",n) =
\
      % { 'count': n }
print gettext.gettext("%(oldCurrency)s is replaced by %(newCurrency)s."=
) \
      % { 'oldCurrency': "FF", 'newCurrency' : "EUR" }
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D f=
r.po =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
msgid ""
msgstr ""
"Content-Type: text/plain; charset=3DISO-8859-1\n"
"Plural-Forms: nplurals=3D2; plural=3D(n > 1);\n"

msgid "'Your command, please?', asked the waiter."
msgstr "=ABVotre commande, s'il vous plait=BB, dit le gar=E7on."

# Les gateaux allemands sont les meilleurs du monde.
#, python-format
msgid "a piece of cake"
msgid_plural "%(count)d pieces of cake"
msgstr[0] "un morceau de gateau"
msgstr[1] "%(count)d morceaux de gateau"

# Reverse the arguments.
#, python-format
msgid "%(oldCurrency)s is replaced by %(newCurrency)s."
msgstr "%(newCurrency)s remplace %(oldCurrency)s."
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

$ mkdir -p fr/LC_MESSAGES
$ msgfmt -o fr/LC_MESSAGES/prog.mo fr.po
$ LANGUAGE=3D LC_ALL=3Dfr_FR python prog.py 2
=ABVotre commande, s'il vous plait=BB, dit le gar=E7on.
Traceback (most recent call last):
  File "prog.py", line 10, in ?
    print gettext.ngettext("a piece of cake","%(count)d pieces of cake"=
,n) \
  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
 445, in ngettext
    return dngettext(_current_domain, msgid1, msgid2, n)
  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
 437, in dngettext
    return t.ngettext(msgid1, msgid2, n)
  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line=
 294, in ngettext
    return self._catalog[(msgid1, self.plural(n))]
AttributeError: GNUTranslations instance has no attribute 'plural'

Why does it have no 'plural' attribute?

* Testing that the header entry starts with 'Project-Id-Version:' is
not appropriate because it excludes valid header entries. The gettext
tools may remove or move this line in future versions.

* libintl and msgfmt assume a fallback of "n !=3D 1" if no Plural-Forms=
:
entry is provided. In the same way, self.plural should use "n !=3D 1" a=
s a
fallback.

Here is the fix for both.

*** gettext.py.bak=092003-02-22 02:28:17.000000000 +0100
--- gettext.py=092003-02-22 21:37:33.000000000 +0100
***************
*** 114,119 ****
--- 114,121 ----
 =20
      return eval('lambda n: int(%s)' % plural)
 =20
+ _germanic_plural =3D lambda n: int(n !=3D 1)
+=20
 =20
 =20
  def _expand_lang(locale):
***************
*** 225,230 ****
--- 227,233 ----
          # Parse the .mo file header, which consists of 5 little endia=
n 32
          # bit words.
          self._catalog =3D catalog =3D {}
+         self.plural =3D _germanic_plural
          buf =3D fp.read()
          buflen =3D len(buf)
          # Are we big endian or little endian?
***************
*** 258,264 ****
              else:
                  raise IOError(0, 'File is corrupt', filename)
              # See if we're looking at GNU .mo conventions for metadat=
a
!             if mlen =3D=3D 0 and tmsg.lower().startswith('project-id-=
version:'):
                  # Catalog description
                  for item in tmsg.split('\n'):
                      item =3D item.strip()
--- 261,267 ----
              else:
                  raise IOError(0, 'File is corrupt', filename)
              # See if we're looking at GNU .mo conventions for metadat=
a
!             if mlen =3D=3D 0:
                  # Catalog description
                  for item in tmsg.split('\n'):
                      item =3D item.strip()

4) Btw, I have to correct a misimpression. It was claimed in
http://mail.python.org/pipermail/i18n-sig/2002-November/001514.html
that GNU xgettext 0.11.5 doesn't support ngettext in Python. But it doe=
s
if you add the command line options "-kgettext -kngettext:1,2". The rea=
son
is that when xgettext 0.11.5 was released, Python didn't have the ngett=
ext
function, and noone told me that it would.

So for example,

   $ xgettext -kgettext -kngettext:1,2 -o - prog.py

produces the .pot file for prog.py above.

Bruno