[Tutor] Question regular expressions - the non-greedy pattern

Walter Prins wprins at gmail.com
Mon Jan 21 17:23:39 CET 2013


Hi,



On 21 January 2013 14:45, Marcin Mleczko <Marcin.Mleczko at onet.eu> wrote:

> Did I get the concept of non-greedy wrong or is this really a bug?


Hugo's already explained the essence of your problem, but just to
add/reiterate:

a) match() will match at the beginning of the string (first character) or
not at all.  As specified your regex does in fact match from the first
character as shown so the result is correct.  (Aside, "<html>" in "<<html>"
does not in fact match *from the beginning of the string* so is besides the
point for the match() call.)

b) Changing your regexp so that the body of the tag *cannot* contain "<",
and then using search() instead, will fix your specific case for you:

import re

s = '<<html><head><title>Title</title>'
tag_regex = '<[^<]*?>'

matchobj = re.match(tag_regex, s)
print "re.match() result:", matchobj # prints None since no match at start
of s

matchobj = re.search(tag_regex, s)
# prints something since regex matches at index 1 of string
print "re.search() result:\n",
print "span:", matchobj.span()
print "group:", matchobj.group()


Walter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20130121/40690896/attachment.html>


More information about the Tutor mailing list