[Tutor] Download file via HTTP GET with progress monitoring & custom headers?

Tue Aug 12 00:47:35 CEST 2008

Thanks Kent!
Here's what I got, works pretty well. :)
import urllib2

#functions
def reportDownloadProgress(blocknum, bs, size):
    percent = int(blocknum*bs*100/size)
    print str(blocknum*bs ) + '/' + str(size) + 'downloaded| ' +
str(percent) + '%'

def httpDownload(url, filename, headers=None, reporthook=None,
postData=None):
    reqObj = urllib2.Request(url, postData, headers)
    fp = urllib2.urlopen(reqObj)
    headers = fp.info()
    ##    This function returns a file-like object with two additional
methods:
    ##
    ##    * geturl() -- return the URL of the resource retrieved
    ##    * info() -- return the meta-information of the page, as a
dictionary-like object
    ##
    ##Raises URLError on errors.
    ##
    ##Note that None may be returned if no handler handles the request
(though the default installed global OpenerDirector uses UnknownHandler to
ensure this never happens).

    #read & write fileObj to filename
    tfp = open(filename, 'wb')
    result = filename, headers
    bs = 1024*8
    size = -1
    read = 0
    blocknum = 0

    if reporthook:
        if "content-length" in headers:
            size = int(headers["Content-Length"])
        reporthook(blocknum, bs, size)

    while 1:
        block = fp.read(bs)
        if block == "":
            break
        read += len(block)
        tfp.write(block)
        blocknum += 1
        if reporthook:
            reporthook(blocknum, bs, size)

    fp.close()
    tfp.close()
    del fp
    del tfp

    # raise exception if actual size does not match content-length header
    if size >= 0 and read < size:
        raise ContentTooShortError("retrieval incomplete: got only %i out "
                                    "of %i bytes" % (read, size), result)

    return result

url = '
http://akvideos.metacafe.com/ItemFiles/%5BFrom%20www.metacafe.com%5D%20292662.2155544.11.flv
'
headers = {
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Accept' :
'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
'Accept-Language' : 'fr-fr,en-us;q=0.7,en;q=0.3',
'Accept-Charset' : 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'
}

#test it
httpDownload(url, 'testDownload.flv', headers, reportDownloadProgress)

On Sun, Aug 10, 2008 at 6:42 PM, Kent Johnson <kent37 at tds.net> wrote:

> On Sun, Aug 10, 2008 at 6:09 PM, xbmuncher <xboxmuncher at gmail.com> wrote:
> > I want to download a file via HTTP GET protocol (perhaps HTTP POST in the
> > future, whatever method should allow relativly easy changes to be made to
> > use POST later on)
> > I need to send custom headers like in urllib2.urlrequest()
> > I need to be able to monitor the download progress WHILE it downloads
> like
> > the hook function in urllib
> >
> > I looked into urllib2 and it did everything except allow me to check the
> > progress of the download during the download, it only downloads the file
> > first with urlopen(). I also tried urllib and the reporthook function is
> > great, except I can't send custom headers! I wish I could send a REQ
> object
> > to the urlretrieve() function in urllib that way I can prepare those
> custom
> > headers.. :(
> >
> > Anyways, i would appreciate if soemone could write me a quick example to
> do
> > this, or point me to the right library/funcs to do it. I want to be able
> to
> > do it in native python libraries.
>
> urllib2.urlopen() doesn't read the data from the remote site, it
> returns a file-like object with a read() method. You can read the data
> in chunks and call a reporthook yourself.
>
> Take a look at the source to URLopener.retrieve() in urllib.py for some
> hints.
>
> Kent
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20080811/a6e26a55/attachment.htm>