Regular Expression question

Paul McGuire ptmcg at
Thu Jun 8 00:49:34 EDT 2006

"Frank Potter" < at> wrote in message
news:mailman.6720.1149730752.27775.python-list at
> pyparsing is cool.
> but use only re is also OK
> # -*- coding: UTF-8 -*-
> import urllib2
> html=urllib2.urlopen(ur"").read()
> import re
> r=re.compile('<img\s+src="(?P<image>[^"]+)"[^>]*>',re.IGNORECASE)
> for m in r.finditer(html):
>     print'image')

Ouch - this fails to match any <img> tag that has some other attribute, such
as "height" or "width", before the "src" attribute. has
several such tags.

On the other hand, pyparsing's makeHTMLTags defines a starting tag
expression that looks for (conceptually):

    < tagname ZeroOrMore(attrname '=' value) Optional('/') >

and does not assume that the first tag is "src", or anything else for that

The returned results make the tag attributes accessible as object attributes
or dictionary keys.

-- Paul

More information about the Python-list mailing list