unexpected regexp behaviour using 'A|B|C.....'
matt.j.warren at gmail.com
Thu Jul 28 11:56:33 CEST 2011
When using re patterns of the form 'A|B|C|...' the docs seem to
suggest that once any of A,B,C.. match, it is captured and no further
patterns are tried. But I am seeing,
st=' Id Name Prov Type CopyOf BsId
Rd -Detailed_State- Adm Snp Usr VSize'
p='Type *| *Type'
Shouldn’t the second search return the same as the first, if further
patterns are not tried?
The documentation appears to suggest the first match should be
returned, or am I misunderstanding?
A|B, where A and B can be arbitrary REs, creates a regular expression
that will match either A or B. An arbitrary number of REs can be
separated by the '|' in this way. This can be used inside groups (see
below) as well. As the target string is scanned, REs separated by '|'
are tried from left to right.
When one pattern completely matches, that branch is accepted. This
means that once A matches, B will not be tested further, even if it
would produce a longer overall match.
In other words, the '|' operator is never greedy. To match a literal
'|', use \|, or enclose it inside a character class, as in [|].
More information about the Python-list