[Python-Dev] Some questions about maintenance of the regular
expression code.
M.-A. Lemburg
mal@lemburg.com
Wed, 26 Feb 2003 21:35:34 +0100
Gary Herron wrote:
> On Wednesday 26 February 2003 10:23 am, M.-A. Lemburg wrote:
>>>>>The first glance at the regular expression bug list and the _sre.c
>>>>>code results in the observation that several of the bugs are related
>>>>>to running over the recursion limit. The problem comes from using a
>>>>>pattern containing ".*?" in a situation where it is expected to match
>>>>>many thousands of characters. Each character matched by ".*?" causes
>>>>>one level or recursion, quickly overflowing the recursion limit.
>>>>
>>>>Wouldn't it be possible for the RE compiler to issue a warning in
>>>>case these kind of patterns are used ? This would be much more helpful
>>>>than trying to work-around the user problem.
>>>
>>>I think not. It's not the pattern that's the problem. A pattern
>>>containing ".*?" is perfectly legitimate and useful.
>>
>>Hmm, could you explain where ".*?" is useful ?
>
> Yes, easily. It's the non-greedy version of "match all". The manual
> page for the re module has this nice example:
>
> *?, +?, ??
> The "*", "+", and "?" qualifiers are all greedy; they match as much
> text as possible. Sometimes this behaviour isn't desired; if the RE
> <.*> is matched against '<H1>title</H1>', it will match the entire
> string, and not just '<H1>'. Adding "?" after the qualifier makes it
> perform the match in non-greedy or minimal fashion; as few
> characters as possible will be matched. Using .*? in the previous
> expression will match only '<H1>'.
Ah, ok. I usually write "<[^>]+>" for these things, if at all...
I tend to use mxTextTools for parsing :-)
>>>The problem
>>>arises when the pattern is used on a string which has thousands of
>>>characters which match. By that point the RE compiler is right out of
>>>the picture.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Software directly from the Source (#1, Feb 26 2003)
>>> Python/Zope Products & Consulting ... http://www.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
Python UK 2003, Oxford: 34 days left
EuroPython 2003, Charleroi, Belgium: 118 days left