maximum recursion in "re" module?

Skip Montanaro skip at pobox.com
Tue Oct 16 10:59:14 EDT 2001


    CJ> On Mon Oct 15 20:13:25 2001, Skip Montanaro wrote:
    >> 
    CJ> startString = ".*?Lo Fi Play</phrase></a> <phrase><a href=\""
    >> ...    
    CJ> mp3String = re.compile(startString, re.I).sub("", mp3String, 1)
    >> 
    >> Why do you need the ".*?" part of the re?  I'd try this:

    CJ> Well, basically what I want to go is get one URL out of the big
    CJ> mess o' HTML I'm trying to pull back.

Okay, I must not have had my thinking cap on earlier.  The fact that all you
need is whatever follows the "Lo Fi Play..." stuff suggests that you should
be able to locate it with simple string functions.   Suppose you have your
input in s and have defined

    startString = 'Lo Fi Play</phrase></a> <phrase><a href="'

Then s.find(startString) will return the offset in s to the start of
startString or -1:

    offset = s.find(startString)
    if offset != -1:
        s = s[offset+len(startString):]

Now s should contain the stuff after the first occurrence of startString.
The offset of the first " in s should mark the end of the URL, so

    offset = s.find('"')
    if offset != -1:
        s = s[:offset]

Now s should contain just the URL of interest.

If you really want to do this with regular expressions, I think it would be
easier to explicitly flag the URL like so:

    urlPat = re.compile('Lo Fi Play</phrase></a>'
                        ' <phrase><a href="([^"]*)"')

then:

    mat = urlPat.search(s)
    if mat is not None:
        url = mat.group(1)
    else:
        print "no pattern match for Lo Fi Play..."

Repeat the Zawinski mantra (this stolen from a 1999 c.l.py post by Fredrik
Lundh, the author of Python's current regular expression engine):

    Some people, when confronted with a problem, think
    "I know, I'll use regular expressions".  Now they have
    two problems.
        Jamie Zawinski, on comp.lang.emacs

;-)

-- 
Skip Montanaro (skip at pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/




More information about the Python-list mailing list