[Python-bugs-list] [ python-Bugs-409311 ] Python 2.1b1 re module is broken!
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 22 Mar 2001 09:15:04 -0800
Bugs item #409311, was updated on 2001-03-16 19:40
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=409311&group_id=5470
Category: Regular Expressions
Group: None
>Status: Closed
Priority: 7
Submitted By: Gregory P. Smith (greg)
Assigned to: Fredrik Lundh (effbot)
Summary: Python 2.1b1 re module is broken!
Initial Comment:
the following should -not- match:
$ python
Python 2.1b1 (#1, Mar 12 2001, 18:20:53)
[GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2
Type "copyright", "credits" or "license" for more
information.
>>> reg = r"(?im)<dtml-var\s+([a-z_0-9]+?)\s*>"
>>> str = '<dtml-var
expr="Presentation.show(\'start\')">'
>>> import re
>>> re.match(reg, str)
<SRE_Match object at 0x810d9d0>
In python 1.5.2 and 2.0 this works fine.
----------------------------------------------------------------------
>Comment By: Fredrik Lundh (effbot)
Date: 2001-03-22 09:15
Message:
Logged In: YES
user_id=38376
fixed in 2.1b2
----------------------------------------------------------------------
Comment By: Fredrik Lundh (effbot)
Date: 2001-03-21 11:03
Message:
Logged In: YES
user_id=38376
same as #233283
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-03-18 10:58
Message:
Logged In: YES
user_id=31435
So, Moshe, what's worse: floating-point or regexps <2/3
wink>? For the life of me, I'll never be able to read +?
as a minimal match -- it's so clearly "match one or more,
but optionally"!
----------------------------------------------------------------------
Comment By: Moshe Zadka (moshez)
Date: 2001-03-18 03:38
Message:
Logged In: YES
user_id=11645
Here is a simpler test case which shows the same
problem:
>>> str, r
('e=>', '(e+?)>')
>>> re.match(r, str)
<SRE_Match object at 0x4015f2e0>
>>> pre.match(r, str)
>>>
If we lose the laziness (make the pattern "(e+)>") then it
works OK.
So the crucial problem seems to be the compilation/execution
of the lazy patterns, *not* the compilation/execution of
character classes.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-03-17 22:33
Message:
Logged In: YES
user_id=31435
Assigned to Fredrik and boosted priority.
Gregory, it's hard to see exactly what your str vrbl
contains because there appears to be an embedded newline in
it. Whatever, if I change your
+?
to the semantically equivalent
*
then the problem goes away for what *I* guessed you
intended str to contain. The
[a-z_0-9]
part is also better written as
\w
(since you're using the ?i flag, same thing).
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-03-17 22:20
Message:
Logged In: YES
user_id=31435
Just adding a comment to force SF to send this as email (so
I can read it).
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=409311&group_id=5470