[Image-SIG] urlopen for webpage image file

Fredrik Lundh fredrik@pythonware.com
Sat, 27 Jul 2002 14:10:12 +0200


Richard Boyd wrote:

> I need to dynamically access a webpage and find the largest
> graphic used on that page, so I tried the following ...

largest as in "number of pixels", or "number of bytes"?

if you want to get the size in bytes, you don't need to decode
the image at all; just grab the data, and use the len() function
to get the number of bytes.

file = urllib.urlopen(url)
print len(file.read())

if you don't want to load the entire image, check the header
instead:

    file = urllib.urlopen(uri)
    print file.headers.get("content-length")
    file.close()

urllib always uses the HTTP GET method; to be a bit nicer,
you can use the HTTP HEAD method instead.  this helper
shows you how to do that:

import httplib, urlparse

def getsize(uri):

    # check the uri
    scheme, host, path, params, query, fragment = urlparse.urlparse(uri)
    if scheme != "http":
        raise ValueError("only supports HTTP requests")
    if not path:
        path = "/"
    if params:
        path = path + ";" + params
    if query:
        path = path + "?" + query

    # make a http HEAD request
    h = httplib.HTTP(host)
    h.putrequest("HEAD", path)
    h.putheader("Host", host)
    h.endheaders()

    status, reason, headers = h.getreply()

    h.close()

    return headers.get("content-length")

print getsize("http://www.pythonware.com/images/small-yoyo.gif")

:::

if you want to get the size in pixels, use the ImageFile.Parser
or ImageFileIO modules (or read the whole thing, and wrap it in
a StringIO object).

the following helper returns both the size of the file, and the size
of the image, usually without loading more than 1k or so.

import urllib
import ImageFile

def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = urllib.urlopen(uri)
    size = file.headers.get("content-length")
    if size: size = int(size)
    p = ImageFile.Parser()
    while 1:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return size, None

print getsizes("http://www.pythonware.com/images/small-yoyo.gif")

> What I had to do was switch to REBOL [www.rebol.com]. It is very
> simple in REBOL ...
>
>   print size? http://www.python.org/pics/PythonPoweredSmall.gif
> 
> returns the size ... 
>         361 

oh, you meant bytes.  why didn't you say so from the start? ;-)

(are you sure that size? doesn't load the whole thing, btw?)

</F>