problem with re

Robert Amesz reqhye72zux at
Fri Sep 7 01:38:28 CEST 2001

Patrick Vrijlandt wrote:

> Hi,
> I do not understand why the 7th regular expression does not give
> the same result as 2-6.

> mymatch(r'\n'.join([r'(\d{10})|([a-z]{10})', r'([a-z]{10})',
> r'(\d{10})'])) 

> (This is simplified from something I'm working on. The answer may
> be trivial, but I don't see it)

It's the |-"operator"; from the docs:

    A|B, where A and B can be arbitrary REs, creates a regular
    expression that will match either A or B. An arbitrary number of
    REs can be separated by the "|" in this way. This can be used
    inside groups (see below) as well. REs separated by "|" are tried
    from left to right, and the first one that allows the complete
    pattern to match is considered the accepted branch. This means
    that if A matches, B will never be tested, even if it would
    produce a longer overall match. In other words, the "|" operator
    is never greedy. To match a literal "|", use \|, or enclose it
    inside a character class, as in [|].

It is clear that A (the left hand part) of your re *does* match, and B 
is never tested. So the sub-expression with the | has to be put in a 
group, that would make the first part of your join:


(That is actually two characters shorter than your expression.)

Haven't tested it, but it should work.

HTH, Robert Amesz

More information about the Python-list mailing list