[I18n-sig] Re: bugs in gettext.py plural handling

Mon, 10 Mar 2003 15:12:58 +0100

Hi,

I have posted a patch to the tracker [1] which contains
all your fixes. For the number 2, I have just catched the
tokenize.TokenError exception.

Best regards,
david

[1] 
https://sourceforge.net/tracker/index.php?func=detail&aid=700839&group_id=5470&atid=305470

Bruno Haible wrote:

>Hi,
>
>Testing GNU gettext's integration test with Python 2.3a2, I see that
>there are several bugs relating to plural forms and the ngettext function.
>
>
>1)
>
>$ python
>import gettext
>germanic = gettext.c2py('!(n == 1)')
>Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
>  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 110, in c2py
>    stack[-1] += '(%s)' % s
>IndexError: list index out of range
>
>The ! operator is treated incorrectly if not followed by a space.
>
>Here is a fix.
>
>*** gettext.py.bak	2003-02-22 02:28:17.000000000 +0100
>--- gettext.py	2003-02-22 21:37:33.000000000 +0100
>***************
>*** 88,95 ****
>      plural = plural.replace('&&', ' and ')
>      plural = plural.replace('||', ' or ')
>  
>!     expr = re.compile(r'\![^=]')
>!     plural = expr.sub(' not ', plural)
>  
>      # Regular expression and replacement function used to transform
>      # "a?b:c" to "test(a,b,c)".
>--- 88,95 ----
>      plural = plural.replace('&&', ' and ')
>      plural = plural.replace('||', ' or ')
>  
>!     expr = re.compile(r'\!([^=])')
>!     plural = expr.sub(' not \\1', plural)
>  
>      # Regular expression and replacement function used to transform
>      # "a?b:c" to "test(a,b,c)".
>
>
>2) Unbalanced parentheses in a plural expression don't give an error
>'unbalanced parenthesis in plural form'.
>
>Example:
>$ python
>import gettext
>germanic = gettext.c2py('n =)= 1')
>
>Instead we get an weird error message
>
>tokenize.TokenError: ('EOF in multi-line statement', (2, 0))
>
>Furthermore even if this error were avoided, we would get
>
>IndexError: list index out of range
>
>Here is a fix for the second half of this bug. I don't know Python
>enough to fix the first half as well.
>
>*** gettext.py.bak	2003-02-22 02:28:17.000000000 +0100
>--- gettext.py	2003-02-22 21:37:33.000000000 +0100
>***************
>*** 104,110 ****
>          if c == '(':
>              stack.append('')
>          elif c == ')':
>!             if len(stack) == 0:
>                  raise ValueError, 'unbalanced parenthesis in plural form'
>              s = expr.sub(repl, stack.pop())
>              stack[-1] += '(%s)' % s
>--- 104,110 ----
>          if c == '(':
>              stack.append('')
>          elif c == ')':
>!             if len(stack) == 1:
>                  raise ValueError, 'unbalanced parenthesis in plural form'
>              s = expr.sub(repl, stack.pop())
>              stack[-1] += '(%s)' % s
>
>
>3) Here's my test code (in ISO-8859-1):
>
>===================== prog.py ============================
>import sys
>import gettext
>
>n = int(sys.argv[1])
>
>gettext.textdomain('prog')
>gettext.bindtextdomain('prog', '.')
>
>print gettext.gettext("'Your command, please?', asked the waiter.")
>print gettext.ngettext("a piece of cake","%(count)d pieces of cake",n) \
>      % { 'count': n }
>print gettext.gettext("%(oldCurrency)s is replaced by %(newCurrency)s.") \
>      % { 'oldCurrency': "FF", 'newCurrency' : "EUR" }
>======================= fr.po ============================
>msgid ""
>msgstr ""
>"Content-Type: text/plain; charset=ISO-8859-1\n"
>"Plural-Forms: nplurals=2; plural=(n > 1);\n"
>
>msgid "'Your command, please?', asked the waiter."
>msgstr "«Votre commande, s'il vous plait», dit le garçon."
>
># Les gateaux allemands sont les meilleurs du monde.
>#, python-format
>msgid "a piece of cake"
>msgid_plural "%(count)d pieces of cake"
>msgstr[0] "un morceau de gateau"
>msgstr[1] "%(count)d morceaux de gateau"
>
># Reverse the arguments.
>#, python-format
>msgid "%(oldCurrency)s is replaced by %(newCurrency)s."
>msgstr "%(newCurrency)s remplace %(oldCurrency)s."
>==========================================================
>
>$ mkdir -p fr/LC_MESSAGES
>$ msgfmt -o fr/LC_MESSAGES/prog.mo fr.po
>$ LANGUAGE= LC_ALL=fr_FR python prog.py 2
>«Votre commande, s'il vous plait», dit le garçon.
>Traceback (most recent call last):
>  File "prog.py", line 10, in ?
>    print gettext.ngettext("a piece of cake","%(count)d pieces of cake",n) \
>  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 445, in ngettext
>    return dngettext(_current_domain, msgid1, msgid2, n)
>  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 437, in dngettext
>    return t.ngettext(msgid1, msgid2, n)
>  File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 294, in ngettext
>    return self._catalog[(msgid1, self.plural(n))]
>AttributeError: GNUTranslations instance has no attribute 'plural'
>
>Why does it have no 'plural' attribute?
>
>* Testing that the header entry starts with 'Project-Id-Version:' is
>not appropriate because it excludes valid header entries. The gettext
>tools may remove or move this line in future versions.
>
>* libintl and msgfmt assume a fallback of "n != 1" if no Plural-Forms:
>entry is provided. In the same way, self.plural should use "n != 1" as a
>fallback.
>
>Here is the fix for both.
>
>*** gettext.py.bak	2003-02-22 02:28:17.000000000 +0100
>--- gettext.py	2003-02-22 21:37:33.000000000 +0100
>***************
>*** 114,119 ****
>--- 114,121 ----
>  
>      return eval('lambda n: int(%s)' % plural)
>  
>+ _germanic_plural = lambda n: int(n != 1)
>+ 
>  
>  
>  def _expand_lang(locale):
>***************
>*** 225,230 ****
>--- 227,233 ----
>          # Parse the .mo file header, which consists of 5 little endian 32
>          # bit words.
>          self._catalog = catalog = {}
>+         self.plural = _germanic_plural
>          buf = fp.read()
>          buflen = len(buf)
>          # Are we big endian or little endian?
>***************
>*** 258,264 ****
>              else:
>                  raise IOError(0, 'File is corrupt', filename)
>              # See if we're looking at GNU .mo conventions for metadata
>!             if mlen == 0 and tmsg.lower().startswith('project-id-version:'):
>                  # Catalog description
>                  for item in tmsg.split('\n'):
>                      item = item.strip()
>--- 261,267 ----
>              else:
>                  raise IOError(0, 'File is corrupt', filename)
>              # See if we're looking at GNU .mo conventions for metadata
>!             if mlen == 0:
>                  # Catalog description
>                  for item in tmsg.split('\n'):
>                      item = item.strip()
>
>
>4) Btw, I have to correct a misimpression. It was claimed in
>http://mail.python.org/pipermail/i18n-sig/2002-November/001514.html
>that GNU xgettext 0.11.5 doesn't support ngettext in Python. But it does
>if you add the command line options "-kgettext -kngettext:1,2". The reason
>is that when xgettext 0.11.5 was released, Python didn't have the ngettext
>function, and noone told me that it would.
>
>So for example,
>
>   $ xgettext -kgettext -kngettext:1,2 -o - prog.py
>
>produces the .pot file for prog.py above.
>
>
>Bruno
>
>  
>

-- 
J. David Ibáñez, http://www.j-david.net
Software Engineer / Ingénieur Logiciel / Ingeniero de Software