Retrieve url's of all jpegs at a web page URL

Chris Rebert clp2 at rebertia.com
Tue Sep 15 19:43:23 EDT 2009


On Tue, Sep 15, 2009 at 7:28 AM, grimmus <graham.colmer at gmail.com> wrote:
> Hi,
>
> I would like to achieve something like Facebook has when you post a
> link. It shows images located at the URL you entered so you can choose
> what one to display as a summary.
>
> I was thinking i could loop through the html of a page with a regex
> and store all the jpeg url's in an array. Then, i could open the
> images one by one and save them as thumbnails with something like
> below.

0. Install BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/)

1:
#untested
from BeautifulSoup import BeautifulSoup
import urllib

page_url = "http://the.url.here"

with urllib.urlopen(page_url) as f:
    soup = BeautifulSoup(f.read())
for img_tag in soup.findAll("img"):
    relative_url = img_tag.src
    img_url = make_absolute(relative_url, page_url)
    save_image_from_url(img_url)

2. Write make_absolute() and save_image_from_url()

3. Profit.

Cheers,
Chris
--
http://blog.rebertia.com



More information about the Python-list mailing list