Retrieve url's of all jpegs at a web page URL
Paul McGuire
ptmcg at austin.rr.com
Wed Sep 16 01:12:51 EDT 2009
On Sep 15, 11:32 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> Also untested:
>
> from lxml import html
>
> doc = html.parse(page_url)
> doc.make_links_absolute(page_url)
>
> urls = [ img.src for img in doc.xpath('//img') ]
>
> Then use e.g. urllib2 to save the images.
Looks similar to what a pyparsing approach would look like:
from pyparsing import makeHTMLTags, htmlComment
import urllib
html = urllib.urlopen(url).read()
imgTag = makeHTMLTags("img")[0]
imgTag.ignore(htmlComment)
urls = [ img.src for img in imgTag.searchString(html) ]
-- Paul
More information about the Python-list
mailing list