problem with re
Robert Amesz
reqhye72zux at mailexpire.com
Thu Sep 6 19:38:28 EDT 2001
Patrick Vrijlandt wrote:
> Hi,
>
> I do not understand why the 7th regular expression does not give
> the same result as 2-6.
> mymatch(r'\n'.join([r'(\d{10})|([a-z]{10})', r'([a-z]{10})',
> r'(\d{10})']))
> (This is simplified from something I'm working on. The answer may
> be trivial, but I don't see it)
It's the |-"operator"; from the docs:
"|"
A|B, where A and B can be arbitrary REs, creates a regular
expression that will match either A or B. An arbitrary number of
REs can be separated by the "|" in this way. This can be used
inside groups (see below) as well. REs separated by "|" are tried
from left to right, and the first one that allows the complete
pattern to match is considered the accepted branch. This means
that if A matches, B will never be tested, even if it would
produce a longer overall match. In other words, the "|" operator
is never greedy. To match a literal "|", use \|, or enclose it
inside a character class, as in [|].
It is clear that A (the left hand part) of your re *does* match, and B
is never tested. So the sub-expression with the | has to be put in a
group, that would make the first part of your join:
r'(\d{10}|[a-z]{10})'
(That is actually two characters shorter than your expression.)
Haven't tested it, but it should work.
HTH, Robert Amesz
More information about the Python-list
mailing list