unexpected regexp behaviour using 'A|B|C.....'
__peter__ at web.de
Thu Jul 28 12:57:18 CEST 2011
> When using re patterns of the form 'A|B|C|...' the docs seem to
> suggest that once any of A,B,C.. match, it is captured and no further
> patterns are tried. But I am seeing,
> st=' Id Name Prov Type CopyOf BsId
> Rd -Detailed_State- Adm Snp Usr VSize'
> p='Type *'
> 'Type '
> p='Type *| *Type'
> ' Type'
> Shouldn’t the second search return the same as the first, if further
> patterns are not tried?
> The documentation appears to suggest the first match should be
> returned, or am I misunderstanding?
All alternatives are tried at a given starting position in the string before
the algorithm advances to the next position. The second alternative
" *Type", at least one space followed by the character sequence "Type"
matches right after "Prov" in your example, therefore the first
alternative, "Type" and any following spaces, which would match after
"Prov " is never tried.
Maybe you accidentally typed one extra " "? If you didn't " +Type" would be
More information about the Python-list