Bug in regular expressions ?
Sean 'Shaleh' Perry
shalehperry at attbi.com
Fri May 17 13:57:55 EDT 2002
>
> So A|B and B|A are not always equivalent. When A and B match, B is ignored
> even if the matched text is longer.
> Is this a bug in the re module?
> Is there a way to tell re to be "totaly greedy"?
>
$ perl -e '$foo="aa";$foo =~ m/(a|aa)/; print $1."\n"'
a
$ perl -e '$foo="aa";$foo =~ m/(aa|a)/; print $1."\n"'
aa
Note the perl behaviour is the same.
>From the python library reference:
``|''
`A|B', where A and B can be arbitrary REs, creates a regular
expression that will match either A or B. An arbitrary number of
REs can be separated by the `|' in this way. This can be used
inside groups (see below) as well. REs separated by `|' are tried
from left to right, and the first one that allows the complete
pattern to match is considered the accepted branch. This means
that if `A' matches, `B' will never be tested, even if it would
produce a longer overall match. In other words, the `|' operator
is never greedy. To match a literal `|', use "\|", or enclose it
inside a character class, as in "[|]".
More information about the Python-list
mailing list