Retrieve url's of all jpegs at a web page URL

Paul McGuire ptmcg at austin.rr.com
Wed Sep 16 01:12:51 EDT 2009


On Sep 15, 11:32 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> Also untested:
>
>         from lxml import html
>
>         doc = html.parse(page_url)
>         doc.make_links_absolute(page_url)
>
>         urls = [ img.src for img in doc.xpath('//img') ]
>
> Then use e.g. urllib2 to save the images.

Looks similar to what a pyparsing approach would look like:

    from pyparsing import makeHTMLTags, htmlComment

    import urllib
    html = urllib.urlopen(url).read()

    imgTag = makeHTMLTags("img")[0]
    imgTag.ignore(htmlComment)

    urls = [ img.src for img in imgTag.searchString(html) ]

-- Paul



More information about the Python-list mailing list