[ #456742 ] Failing test case for .*?
Tim Peters
tim.one at home.com
Sat Nov 3 13:06:03 EST 2001
[A.M. Kuchling]
> +? is a non-greedy +, and it's not equivalent to (...+)?.
>
> Here's a test program:
>
> import sre
> s = "a\nb\na1"
>
> # Original, buggy pattern
> p = sre.compile(r" [^\n]+? \d", sre.DEBUG | sre.VERBOSE)
> m = p.search(s)
> print (m and m.groups())
This is chasing an illusion: print m.group(0) instead. Since this pattern
contains no explicit capturing groups, m.groups() can't produce anything
other than an empty empty. Here in a simpler setting:
>>> m = re.match('a', 'a')
>>> m.groups() # useless for a pattern without capturing groups
()
>>> m.group(0) # useful
'a'
>>>
> # Add a group
> p = sre.compile(r" ([^\n]+?) \d", sre.DEBUG | sre.VERBOSE)
> m = p.search(s)
> print (m and m.groups())
>
> When I run with the current CVS Python, two different results are
> produced, even though the only difference is adding a pair of
> parentheses:
Yes, and that means .groups() is working correctly in both cases. Back to
the simpler example:
>>> m = re.match('(a)', 'a')
>>> m.groups()
('a',)
>>> m.group(0)
'a'
>>>
The bug is that the pattern should have found just the 'a1' tail, not all of
s; it's the same bug in both cases:
import re
s = "a\nb\na1"
m = re.search(r'[^\n]+?\d', s)
print m and `m.group(0)` # prints 'a\nb\na1'; should have been 'a1'
m = re.search(r'([^\n]+?)\d', s)
print m and `m.group(0)` # also prints 'a\nb\na1'
More information about the Python-list
mailing list