leftmost longest match (of disjunctions) ; greediness of "|"

Fredrik Lundh fredrik at pythonware.com
Tue Dec 2 08:15:27 EST 2003


Joerg Schuster wrote:

> > > O.k. Thanks for pointing this out. Maybe I should have formulated my
> > > question differently: Is there a trick (be it dirty or not) to make
> > > "|" greedy in python?
> >
> > sort the re by size first?
>
> The point is not to get the match of the longest part of the
> disjunction, but to get the match of that part of the disjunction
> which is the longest one. (The match of ".*" may be much longer
> than the match of "abc", although the latter regex contains more
> characters.)

you can use "sre_parse.parse(x).getwidth()" on a subexpression, to
get the shortest/longest possible match.

>>> from sre_parse import parse
>>> parse("a?").getwidth()
(0, 1)
>>> parse("ab").getwidth()
(2, 2)
>>> parse(".+").getwidth()
(1, 65535)

(where >=65535 should be interpreted as "any number")

</F>








More information about the Python-list mailing list