Bug in regular expressions ?

Sean 'Shaleh' Perry shalehperry at attbi.com
Fri May 17 13:57:55 EDT 2002


> 
> So A|B and B|A are not always equivalent. When A and B match, B is ignored
> even if the matched text is longer.
> Is this a bug in the re module?
> Is there a way to tell re to be "totaly greedy"?
> 

$ perl -e '$foo="aa";$foo =~ m/(a|aa)/; print $1."\n"'
a
$ perl -e '$foo="aa";$foo =~ m/(aa|a)/; print $1."\n"'
aa

Note the perl behaviour is the same.

>From the python library reference:

``|''
     `A|B', where A and B can be arbitrary REs, creates a regular
     expression that will match either A or B.  An arbitrary number of
     REs can be separated by the `|' in this way.  This can be used
     inside groups (see below) as well.  REs separated by `|' are tried
     from left to right, and the first one that allows the complete
     pattern to match is considered the accepted branch.  This means
     that if `A' matches, `B' will never be tested, even if it would
     produce a longer overall match.  In other words, the `|' operator
     is never greedy.  To match a literal `|', use "\|", or enclose it
     inside a character class, as in "[|]".






More information about the Python-list mailing list