Regular Expression question
ptmcg at austin.rr._bogus_.com
Thu Jun 8 00:49:34 EDT 2006
"Frank Potter" <could.net at gmail.com> wrote in message
news:mailman.6720.1149730752.27775.python-list at python.org...
> pyparsing is cool.
> but use only re is also OK
> # -*- coding: UTF-8 -*-
> import urllib2
> import re
> for m in r.finditer(html):
> print m.group('image')
Ouch - this fails to match any <img> tag that has some other attribute, such
as "height" or "width", before the "src" attribute. www.yahoo.com has
several such tags.
On the other hand, pyparsing's makeHTMLTags defines a starting tag
expression that looks for (conceptually):
< tagname ZeroOrMore(attrname '=' value) Optional('/') >
and does not assume that the first tag is "src", or anything else for that
The returned results make the tag attributes accessible as object attributes
or dictionary keys.
More information about the Python-list