[Python-bugs-list] [ python-Bugs-409311 ] Python 2.1b1 re module is broken!

noreply@sourceforge.net noreply@sourceforge.net
Thu, 22 Mar 2001 09:15:04 -0800


Bugs item #409311, was updated on 2001-03-16 19:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=409311&group_id=5470

Category: Regular Expressions
Group: None
>Status: Closed
Priority: 7
Submitted By: Gregory P. Smith (greg)
Assigned to: Fredrik Lundh (effbot)
Summary: Python 2.1b1 re module is broken!

Initial Comment:
the following should -not- match:

$ python
Python 2.1b1 (#1, Mar 12 2001, 18:20:53) 
[GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2
Type "copyright", "credits" or "license" for more
information.
>>> reg = r"(?im)<dtml-var\s+([a-z_0-9]+?)\s*>"
>>> str = '<dtml-var
expr="Presentation.show(\'start\')">'
>>> import re                                
>>> re.match(reg, str)                       
<SRE_Match object at 0x810d9d0>


In python 1.5.2 and 2.0 this works fine.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2001-03-22 09:15

Message:
Logged In: YES 
user_id=38376

fixed in 2.1b2

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2001-03-21 11:03

Message:
Logged In: YES 
user_id=38376

same as #233283

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-03-18 10:58

Message:
Logged In: YES 
user_id=31435

So, Moshe, what's worse:  floating-point or regexps <2/3 
wink>?  For the life of me, I'll never be able to read +? 
as a minimal match -- it's so clearly "match one or more, 
but optionally"!

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2001-03-18 03:38

Message:
Logged In: YES 
user_id=11645

Here is a simpler test case which shows the same
problem:

>>> str, r
('e=>', '(e+?)>')
>>> re.match(r, str)
<SRE_Match object at 0x4015f2e0>
>>> pre.match(r, str)
>>> 

If we lose the laziness (make the pattern "(e+)>") then it
works OK.

So the crucial problem seems to be the compilation/execution
of the lazy patterns, *not* the compilation/execution of
character classes.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-03-17 22:33

Message:
Logged In: YES 
user_id=31435

Assigned to Fredrik and boosted priority.

Gregory, it's hard to see exactly what your str vrbl 
contains because there appears to be an embedded newline in 
it.  Whatever, if I change your

    +?

to the semantically equivalent

    *

then the problem goes away for what *I* guessed you 
intended str to contain.  The

    [a-z_0-9]

part is also better written as

    \w

(since you're using the ?i flag, same thing).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-03-17 22:20

Message:
Logged In: YES 
user_id=31435

Just adding a comment to force SF to send this as email (so 
I can read it).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=409311&group_id=5470