Regular Expression question

Paul McGuire ptmcg at
Wed Jun 7 19:22:03 EDT 2006

<ken.carlino at> wrote in message
news:1149714949.542234.148800 at
> Hi,
> I am new to python regular expression, I would like to use it to get an
> attribute of an html element from an html file?
> for example, I was able to read the html file using this:
>    req = urllib2.Request(url=acaURL)
>     f = urllib2.urlopen(req)
>     data =
> my question is how can I just get the src attribute value of an img
> tag?
> something like this:
> (.*)<img src="href of the image source">(.*)
> I need to get the href of the image source.
> Thanks.

As Fredrik pointed out, re's are not the only tool out there.  Here's a
pyparsing solution.

-- Paul

import pyparsing
import urllib

# define HTML tag format using makeHTMLTags helper
# (we don't really care about the ending </img> tag,
# even though makeHTMLTags returns definitions for both
# starting and ending tag patterns)
imgStartTag, dummy = pyparsing.makeHTMLTags("img")

# get HTML source from some web site
htmlPage = urllib.urlopen("")
htmlSource =

# scan HTML source, printing SRC attribute from each <img> tag
for tokens,start,end in imgStartTag.scanString(htmlSource):
    print tokens.src


More information about the Python-list mailing list