leftmost longest match (of disjunctions) ; greediness of "|"
Fredrik Lundh
fredrik at pythonware.com
Tue Dec 2 08:15:27 EST 2003
Joerg Schuster wrote:
> > > O.k. Thanks for pointing this out. Maybe I should have formulated my
> > > question differently: Is there a trick (be it dirty or not) to make
> > > "|" greedy in python?
> >
> > sort the re by size first?
>
> The point is not to get the match of the longest part of the
> disjunction, but to get the match of that part of the disjunction
> which is the longest one. (The match of ".*" may be much longer
> than the match of "abc", although the latter regex contains more
> characters.)
you can use "sre_parse.parse(x).getwidth()" on a subexpression, to
get the shortest/longest possible match.
>>> from sre_parse import parse
>>> parse("a?").getwidth()
(0, 1)
>>> parse("ab").getwidth()
(2, 2)
>>> parse(".+").getwidth()
(1, 65535)
(where >=65535 should be interpreted as "any number")
</F>
More information about the Python-list
mailing list