problem with my regex?

Brian bnblazer at gmail.com
Mon May 22 17:15:45 EDT 2006


I have a simple script below that is causing me some problems and I am
having a hard time tracking them down.  Here is the code:

import urllib
import re

def getPicLinks():
    found = []
    try:
        page =
urllib.urlopen("http://continuouswave.com/whaler/cetacea/")
    except:
        print "ERROR RREADING PAGE."
        sys.exit()
    page1 = page.read()
    cetLinks = re.compile("cetaceaPage..\.html", page1)
    for line in page1:
        found.append(cetLinks.findall(line))
    print found

This is the error message:
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_parse.py",
line 396, in _parse
    if state.flags & SRE_FLAG_VERBOSE:
TypeError: unsupported operand type(s) for &: 'str' and 'int'

I am trying to extract the links on a web page that have a similar
pattern.  Here is an example of the html source:

<HR>
<P><SMALL><A HREF="photoLog.html">PHOTO-LOG</A><br>
<A HREF="guide.html">How-To-Submit</A><BR><A
HREF="cetaceaPage01.html">01</A> | <A
HREF="cetaceaPage02.html">02</A> | <A
HREF="cetaceaPage03.html">03</A> | <A
HREF="cetaceaPage04.html">04</A> | <A
HREF="cetaceaPage05.html">05</A> | <A
HREF="cetaceaPage06.html">06</A> | <A
HREF="cetaceaPage07.html">07</A> | <A
HREF="cetaceaPage08.html">08</A> | <A
HREF="cetaceaPage09.html">09</A> | <A
HREF="cetaceaPage10.html">10</A>
<BR><A>

My problem is that I can't seem to be able to figure out what is going
wrong here.  Mostly because I am a bit confused by the error message as
it points to a file (presumable part of re) that I am unfamiliar with,
and I am a bit new with python.

Any help is greatly appreciated, as is your patience.

Brian




More information about the Python-list mailing list