leftmost longest match (of disjunctions)

Peter Hansen peter at engcorp.com
Mon Dec 1 12:36:11 EST 2003


Joerg Schuster wrote:
> 
> The program given below returns the lines:
> 
> a
> ab
> 
> Is there a way to use python regular expressions such that the program
> would return the following lines?
> 
> ab
> ab
> 
> ########################################################################
> 
> import re
> 
> rx1 = re.compile("(a|ab)")
> rx2 = re.compile("(ab|a)")

Have you checked the documentation for "re"?

It reads:
"|"   A|B, where A and B can be arbitrary REs, creates a regular expression 
that will match either A or B. An arbitrary number of REs can be separated 
by the "|" in this way. This can be used inside groups (see below) as well. 
As the target string is scanned, REs separated by "|" are tried from left to 
right. When one pattern completely matches, that branch is accepted. This 
means that once A matches, B will not be tested further, even if it would 
produce a longer overall match.  In other words, the "|" operator is never 
greedy. 

------

Seems pretty clear and explicit to me.  Your example is basically a working
proof of the above code, so I'm not sure what you were expecting differently.

-Peter




More information about the Python-list mailing list