[Python-Dev] Some questions about maintenance of the regular expression code.

M.-A. Lemburg mal@lemburg.com
Wed, 26 Feb 2003 21:35:34 +0100


Gary Herron wrote:
> On Wednesday 26 February 2003 10:23 am, M.-A. Lemburg wrote:
>>>>>The first glance at the regular expression bug list and the _sre.c
>>>>>code results in the observation that several of the bugs are related
>>>>>to running over the recursion limit.  The problem comes from using a
>>>>>pattern containing ".*?" in a situation where it is expected to match
>>>>>many thousands of characters.  Each character matched by ".*?" causes
>>>>>one level or recursion, quickly overflowing the recursion limit.
>>>>
>>>>Wouldn't it be possible for the RE compiler to issue a warning in
>>>>case these kind of patterns are used ? This would be much more helpful
>>>>than trying to work-around the user problem.
>>>
>>>I think not.  It's not the pattern that's the problem.  A pattern
>>>containing ".*?" is perfectly legitimate and useful.
>>
>>Hmm, could you explain where ".*?" is useful ?
> 
> Yes, easily.  It's the non-greedy version of "match all".  The manual
> page for the re module has this nice example:
> 
> *?, +?, ?? 
>   The "*", "+", and "?" qualifiers are all greedy; they match as much
>   text as possible. Sometimes this behaviour isn't desired; if the RE
>   <.*> is matched against '<H1>title</H1>', it will match the entire
>   string, and not just '<H1>'. Adding "?" after the qualifier makes it
>   perform the match in non-greedy or minimal fashion; as few
>   characters as possible will be matched. Using .*? in the previous
>   expression will match only '<H1>'.

Ah, ok. I usually write "<[^>]+>" for these things, if at all...
I tend to use mxTextTools for parsing :-)

>>>The problem
>>>arises when the pattern is used on a string which has thousands of
>>>characters which match.  By that point the RE compiler is right out of
>>>the picture.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Feb 26 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
Python UK 2003, Oxford:                                     34 days left
EuroPython 2003, Charleroi, Belgium:                       118 days left