regex confusion

John Hunter jdhunter at ace.bsd.uchicago.edu
Tue Dec 9 10:43:24 EST 2003


In trying to sdebug why a certain regex wasn't working like I expected
it to, I came across this strange (to me) behavior.  The file I am
trying to match definitely contains many instances of the letter 'a',
so I would expect the regex

  rgxPrev = re.compile('.*?a.*?')

to match it the string contents of the file.  But it doesn't.  Here is
a complete example

    import re, urllib
    rgxPrev = re.compile('.*?a.*?')

    url = 'http://nitace.bsd.uchicago.edu:8080/files/share/showdown_example2.html'
    s = urllib.urlopen(url).read()
    m =  rgxPrev.match(s)
    print m
    print s.find('a')

m is None (no match) and the s.find('a') reports an 'a' at index 48.

I read the regex to mean non-greedy match of anything up to an a,
followed by non-greedy match of anything following an a, which this
file should match.

Or am I insane?

John Hunter


hunter:~/python/projects/poker/data/pokerroom> uname -a
Linux hunter.paradise.lost 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686
i686 i386 GNU/Linux
hunter:~/python/projects/poker/data/pokerroom> python
Python 2.3.2 (#1, Oct 13 2003, 11:33:15)
[GCC 3.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to rlcompleter2 0.95
for nice experiences hit <tab> multiple times





More information about the Python-list mailing list