Why no RE match of A AND B?
hst at empolis.co.uk
Mon Mar 3 09:59:02 CET 2003
> Just looking through the docs to confirm it and their seems
> to be no regular expression
> construct that matches 'A and B' - an equivalent to the '|'
> character that matches A or B
> To paraphrase the documentation it could be documented as:
> A&B, where A and B can be arbitrary REs, creates a regular
> expression that will match both
> A and B in any order. An arbitrary number of REs can be
> separated by the "&" in this way.
> This can be used inside groups as well. REs separated by "&"
> are tried in arbitrary order,
> the beginning of the first matching branch up to the end of
> the last branch that allows
> the complete pattern (all branches) to match is considered
> the span of the match.
> I envision it being used where a string is accepted at
> present if it matches multiple REs.
> The REs could be rewritten as r"<RE#1>&<RE2>&<RE3>...".
> The use of "&" could be supported by more efficient,
> (simultaneous?), matching of all
> branches of the "&" construct - if such algorithms exist!
> Just an idea,
One of the simplifications of SGML made in the creation of XML was the dropping of the & operator in content models. I believe there were two reasons for this:
1) & was hardly ever used.
2) Implementation problems. Suppose you have a&b&c&d&e&f&g&h. There are 40320 patterns that match this, but this is more than the number of backtracking states that the Python RE engine supports.
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list