problem with re

Robert Amesz reqhye72zux at mailexpire.com
Thu Sep 6 19:38:28 EDT 2001


Patrick Vrijlandt wrote:

> Hi,
> 
> I do not understand why the 7th regular expression does not give
> the same result as 2-6.

> mymatch(r'\n'.join([r'(\d{10})|([a-z]{10})', r'([a-z]{10})',
> r'(\d{10})'])) 

 
> (This is simplified from something I'm working on. The answer may
> be trivial, but I don't see it)

It's the |-"operator"; from the docs:

"|" 
    A|B, where A and B can be arbitrary REs, creates a regular
    expression that will match either A or B. An arbitrary number of
    REs can be separated by the "|" in this way. This can be used
    inside groups (see below) as well. REs separated by "|" are tried
    from left to right, and the first one that allows the complete
    pattern to match is considered the accepted branch. This means
    that if A matches, B will never be tested, even if it would
    produce a longer overall match. In other words, the "|" operator
    is never greedy. To match a literal "|", use \|, or enclose it
    inside a character class, as in [|].


It is clear that A (the left hand part) of your re *does* match, and B 
is never tested. So the sub-expression with the | has to be put in a 
group, that would make the first part of your join:

    r'(\d{10}|[a-z]{10})'

(That is actually two characters shorter than your expression.)


Haven't tested it, but it should work.


HTH, Robert Amesz



More information about the Python-list mailing list