problem with my regex?

Bruno Desthuilliers bdesth.quelquechose at free.quelquepart.fr
Tue May 23 03:00:01 CEST 2006


Brian a écrit :
> I have a simple script below that is causing me some problems and I am
> having a hard time tracking them down.  Here is the code:
> 
> import urllib
> import re
> 
> def getPicLinks():
>     found = []
>     try:
>         page =
> urllib.urlopen("http://continuouswave.com/whaler/cetacea/")
>     except:
Do everyone a favor: don't use bare expect clause

>         print "ERROR RREADING PAGE."
 >         sys.exit()

stdout is for normal program outputs. Error messages should go to 
stderr. And FWIW, your exception handling here is more than useless. 
You'd better let the exception propagate - at worse, it will also exit 
the program, but with the right return value for the system and a 
meaningful traceback.

>     page1 = page.read()
>     cetLinks = re.compile("cetaceaPage..\.html", page1)

Are you sure you've carefully read the doc for re.compile() ?-)

You want something like this (NB : regexp not tested):

       html = page.read()
       page.close() # dont forget to free resources
       cetLinks = re.compile(r"cetaceaPage[0-9]{2}\.html")
       found  = cetLinks.findall(html)
       print "\n".join(found)

> This is the error message:
> "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_parse.py",
> line 396, in _parse
>     if state.flags & SRE_FLAG_VERBOSE:
> TypeError: unsupported operand type(s) for &: 'str' and 'int'

This is not the *full* traceback.

(snip)

> My problem is that I can't seem to be able to figure out what is going
> wrong here.

What's going wrong is that you are passing the html page content as the 
second argument for re.compile(), (instead of an integer value 
representing a combination of various flags, cf the doc for the re module).

>  Mostly because I am a bit confused by the error message as
> it points to a file (presumable part of re)

It is.

The last parts of the traceback are the file and line where the 
exception has been raised and the exception's message. But before, you 
had all the call stack, including the line where you called re.compile() 
with the wrong arguments. Exception tracebacks are usually really useful 
once you know how to read them.

HTH



More information about the Python-list mailing list