[ python-Bugs-1116571 ] Wrong match with regex, non-greedy problem

SourceForge.net noreply at sourceforge.net
Tue Feb 8 09:27:03 CET 2005


Bugs item #1116571, was opened at 2005-02-05 01:12
Message generated for change (Comment added) made by effbot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1116571&group_id=5470

Category: Regular Expressions
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: rengel (engel_re)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: Wrong match with regex, non-greedy problem

Initial Comment:
# This is executable.
# My test string ist rather long:
tst = "In this <c:noun:ns>Buch</c:noun>, used to
designate <c:noun:np>Dinge der Wirklichkeit</c:noun>
rather than <c:noun:fs>SW</c:noun>
<c:noun:ns>Ent</c:noun>."

# I want to match the last part of the string:
# <c:noun:fs>SW</c:noun> <c:noun:ns>Ent</c:noun>
# So I define the following pattern an compile it:
pat = r"<c:noun:(.*?)>(.*?)</c:noun>
<c:noun:(.*?)>(.*?)</c:noun>"
rex = re.compile(pat)

# Then I search the string to get a match group :
mat = rex.search(tst)
# If found, print the group
if mat: print mat.group()

# Instead of 
# <c:noun:fs>SW</c:noun> <c:noun:ns>Ent</c:noun>
# I get the whole string starting with 
# <c:noun:ns>Buch</c:noun>...
# up to the very last </c:noun>
# Apparently the non-greedy operator doesn't work
correctly.
# What's wrong?



----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2005-02-08 09:27

Message:
Logged In: YES 
user_id=38376

Search returns the first (left-most) location where the 
pattern matches, if any.  The non-greedy operator only 
guarantees that you get the shortest possible match at that 
location.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1116571&group_id=5470


More information about the Python-bugs-list mailing list