[Python-bugs-list] [ python-Bugs-817234 ] re.finditer hangs on final empty match

SourceForge.net noreply at sourceforge.net
Fri Oct 3 14:16:04 EDT 2003


Bugs item #817234, was opened at 2003-10-03 09:01
Message generated for change (Comment added) made by kevinbutler
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=817234&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Kevin J. Butler (kevinbutler)
Assigned to: Fredrik Lundh (effbot)
Summary: re.finditer hangs on final empty match

Initial Comment:
The iterator returned by re.finditer appears to not

terminate if the 

final match is empty, but rather keeps returning the

final (empty) match.



Is this a bug in _sre?  If so, I'll be happy to file

it, though fixing 

it is a bit beyond my _sre experience level at this

point.  The solution 

would appear to be to either a check for duplicate

match in 

iterator.next(), or to increment position by one after

returning an 

empty match (which should be OK, because if a non-empty

match started at 

that location, we would have returned it instead of the

empty match).



Code to illustrate the failure:



from re import finditer



last = None

for m in finditer( ".*", "asdf" ):

    if last == m.span():

        print "duplicate match:", last

        break

    print m.group(), m.span()

    last = m.span()

   

---

asdf (0, 4)

 (4, 4)

duplicate match: (4, 4)

---



findall works:



print re.findall( ".*", "asdf" )

['asdf', '']



Workaround is to explicitly check for a duplicate span,

as I did above, 

or to check for a duplicate end(), which avoids the

final empty match



Seo Sanghyeon sent the following fix to python-dev list:



Attached one line patch fixes re.finditer bug reported by

Kevin J. Butler. I read cvs log to find out why this

code is

introduced, and it seems to be related to SF bug #581080.



But that bug didn't appear after my patch, so I wonder

why it was introduced in the first place. It seems beyond

my understanding. Please enlighten me.



To test:



#581080

import re

list(re.finditer('\s', 'a b'))

# expected: one item list

# bug: hang



#Kevin J. Butler

import re

list(re.finditer('.*', 'asdf'))

# expected: two item list (?)

# bug: hang



Seo Sanghyeon

-------------- next part --------------

? patch

Index: Modules/_sre.c

===================================================================

RCS file: /cvsroot/python/python/dist/src/Modules/_sre.c,v

retrieving revision 2.99

diff -c -r2.99 _sre.c

*** Modules/_sre.c	26 Jun 2003 14:41:08 -0000	2.99

--- Modules/_sre.c	2 Oct 2003 03:48:55 -0000

***************

*** 3062,3069 ****

      match = pattern_new_match((PatternObject*)

self->pattern,

                                 state, status);

  

!     if ((status == 0 || state->ptr == state->start) &&

!         state->ptr < state->end)

          state->start = (void*) ((char*) state->ptr +

state->charsize);

      else

          state->start = state->ptr;

--- 3062,3068 ----

      match = pattern_new_match((PatternObject*)

self->pattern,

                                 state, status);

  

!     if (status == 0 || state->ptr == state->start)

          state->start = (void*) ((char*) state->ptr +

state->charsize);

      else

          state->start = state->ptr;

----------------------------------------------------------------------

>Comment By: Kevin J. Butler (kevinbutler)
Date: 2003-10-03 12:16

Message:
Logged In: YES 
user_id=117665

The above patch does resolve the problem.



The code was introduced in rev 2.85

http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_sre.c

to resolve bug 581080

http://sourceforge.net/tracker/index.php?func=detail&aid=581080&group_id=5470&atid=105470

but removing this line does not re-introduce that bug.



Thanks, and kudos to Seo...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=817234&group_id=5470



More information about the Python-bugs-list mailing list