[I18n-sig] Re: bugs in gettext.py plural handling
Juan David Ibáñez Palomar
j-david@noos.fr
Mon, 10 Mar 2003 15:12:58 +0100
Hi,
I have posted a patch to the tracker [1] which contains
all your fixes. For the number 2, I have just catched the
tokenize.TokenError exception.
Best regards,
david
[1]
https://sourceforge.net/tracker/index.php?func=detail&aid=700839&group_id=5470&atid=305470
Bruno Haible wrote:
>Hi,
>
>Testing GNU gettext's integration test with Python 2.3a2, I see that
>there are several bugs relating to plural forms and the ngettext function.
>
>
>1)
>
>$ python
>import gettext
>germanic = gettext.c2py('!(n == 1)')
>Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 110, in c2py
> stack[-1] += '(%s)' % s
>IndexError: list index out of range
>
>The ! operator is treated incorrectly if not followed by a space.
>
>Here is a fix.
>
>*** gettext.py.bak 2003-02-22 02:28:17.000000000 +0100
>--- gettext.py 2003-02-22 21:37:33.000000000 +0100
>***************
>*** 88,95 ****
> plural = plural.replace('&&', ' and ')
> plural = plural.replace('||', ' or ')
>
>! expr = re.compile(r'\![^=]')
>! plural = expr.sub(' not ', plural)
>
> # Regular expression and replacement function used to transform
> # "a?b:c" to "test(a,b,c)".
>--- 88,95 ----
> plural = plural.replace('&&', ' and ')
> plural = plural.replace('||', ' or ')
>
>! expr = re.compile(r'\!([^=])')
>! plural = expr.sub(' not \\1', plural)
>
> # Regular expression and replacement function used to transform
> # "a?b:c" to "test(a,b,c)".
>
>
>2) Unbalanced parentheses in a plural expression don't give an error
>'unbalanced parenthesis in plural form'.
>
>Example:
>$ python
>import gettext
>germanic = gettext.c2py('n =)= 1')
>
>Instead we get an weird error message
>
>tokenize.TokenError: ('EOF in multi-line statement', (2, 0))
>
>Furthermore even if this error were avoided, we would get
>
>IndexError: list index out of range
>
>Here is a fix for the second half of this bug. I don't know Python
>enough to fix the first half as well.
>
>*** gettext.py.bak 2003-02-22 02:28:17.000000000 +0100
>--- gettext.py 2003-02-22 21:37:33.000000000 +0100
>***************
>*** 104,110 ****
> if c == '(':
> stack.append('')
> elif c == ')':
>! if len(stack) == 0:
> raise ValueError, 'unbalanced parenthesis in plural form'
> s = expr.sub(repl, stack.pop())
> stack[-1] += '(%s)' % s
>--- 104,110 ----
> if c == '(':
> stack.append('')
> elif c == ')':
>! if len(stack) == 1:
> raise ValueError, 'unbalanced parenthesis in plural form'
> s = expr.sub(repl, stack.pop())
> stack[-1] += '(%s)' % s
>
>
>3) Here's my test code (in ISO-8859-1):
>
>===================== prog.py ============================
>import sys
>import gettext
>
>n = int(sys.argv[1])
>
>gettext.textdomain('prog')
>gettext.bindtextdomain('prog', '.')
>
>print gettext.gettext("'Your command, please?', asked the waiter.")
>print gettext.ngettext("a piece of cake","%(count)d pieces of cake",n) \
> % { 'count': n }
>print gettext.gettext("%(oldCurrency)s is replaced by %(newCurrency)s.") \
> % { 'oldCurrency': "FF", 'newCurrency' : "EUR" }
>======================= fr.po ============================
>msgid ""
>msgstr ""
>"Content-Type: text/plain; charset=ISO-8859-1\n"
>"Plural-Forms: nplurals=2; plural=(n > 1);\n"
>
>msgid "'Your command, please?', asked the waiter."
>msgstr "«Votre commande, s'il vous plait», dit le garçon."
>
># Les gateaux allemands sont les meilleurs du monde.
>#, python-format
>msgid "a piece of cake"
>msgid_plural "%(count)d pieces of cake"
>msgstr[0] "un morceau de gateau"
>msgstr[1] "%(count)d morceaux de gateau"
>
># Reverse the arguments.
>#, python-format
>msgid "%(oldCurrency)s is replaced by %(newCurrency)s."
>msgstr "%(newCurrency)s remplace %(oldCurrency)s."
>==========================================================
>
>$ mkdir -p fr/LC_MESSAGES
>$ msgfmt -o fr/LC_MESSAGES/prog.mo fr.po
>$ LANGUAGE= LC_ALL=fr_FR python prog.py 2
>«Votre commande, s'il vous plait», dit le garçon.
>Traceback (most recent call last):
> File "prog.py", line 10, in ?
> print gettext.ngettext("a piece of cake","%(count)d pieces of cake",n) \
> File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 445, in ngettext
> return dngettext(_current_domain, msgid1, msgid2, n)
> File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 437, in dngettext
> return t.ngettext(msgid1, msgid2, n)
> File "/packages/gnu-inst-python/2.3a2/lib/python2.3/gettext.py", line 294, in ngettext
> return self._catalog[(msgid1, self.plural(n))]
>AttributeError: GNUTranslations instance has no attribute 'plural'
>
>Why does it have no 'plural' attribute?
>
>* Testing that the header entry starts with 'Project-Id-Version:' is
>not appropriate because it excludes valid header entries. The gettext
>tools may remove or move this line in future versions.
>
>* libintl and msgfmt assume a fallback of "n != 1" if no Plural-Forms:
>entry is provided. In the same way, self.plural should use "n != 1" as a
>fallback.
>
>Here is the fix for both.
>
>*** gettext.py.bak 2003-02-22 02:28:17.000000000 +0100
>--- gettext.py 2003-02-22 21:37:33.000000000 +0100
>***************
>*** 114,119 ****
>--- 114,121 ----
>
> return eval('lambda n: int(%s)' % plural)
>
>+ _germanic_plural = lambda n: int(n != 1)
>+
>
>
> def _expand_lang(locale):
>***************
>*** 225,230 ****
>--- 227,233 ----
> # Parse the .mo file header, which consists of 5 little endian 32
> # bit words.
> self._catalog = catalog = {}
>+ self.plural = _germanic_plural
> buf = fp.read()
> buflen = len(buf)
> # Are we big endian or little endian?
>***************
>*** 258,264 ****
> else:
> raise IOError(0, 'File is corrupt', filename)
> # See if we're looking at GNU .mo conventions for metadata
>! if mlen == 0 and tmsg.lower().startswith('project-id-version:'):
> # Catalog description
> for item in tmsg.split('\n'):
> item = item.strip()
>--- 261,267 ----
> else:
> raise IOError(0, 'File is corrupt', filename)
> # See if we're looking at GNU .mo conventions for metadata
>! if mlen == 0:
> # Catalog description
> for item in tmsg.split('\n'):
> item = item.strip()
>
>
>4) Btw, I have to correct a misimpression. It was claimed in
>http://mail.python.org/pipermail/i18n-sig/2002-November/001514.html
>that GNU xgettext 0.11.5 doesn't support ngettext in Python. But it does
>if you add the command line options "-kgettext -kngettext:1,2". The reason
>is that when xgettext 0.11.5 was released, Python didn't have the ngettext
>function, and noone told me that it would.
>
>So for example,
>
> $ xgettext -kgettext -kngettext:1,2 -o - prog.py
>
>produces the .pot file for prog.py above.
>
>
>Bruno
>
>
>
--
J. David Ibáñez, http://www.j-david.net
Software Engineer / Ingénieur Logiciel / Ingeniero de Software