Downloading huge files via urllib

Jeff Davis jdavis at empires.org
Tue Sep 24 05:41:30 EDT 2002


Hi,

It's certainly possible. urllib is meant as a high-level interface that 
hides those kind of details. Perhaps you could find a way to make it work, 
but I would suggest using the socket module and create a socket and use 
SSL. Then you can read an arbitrary number of bytes from the socket at a 
time. That way, you can be sure to never take in more than, say, 500 bytes 
at a time, preventing your MemoryError.

Note that there is a warning in the docs for socket.ssl():
"Warning: This does not do any certificate verification! "

I think that just means that it doesn't check with a certificate authority, 
so you'll have to do your own verification of the certificates involved. 
But read up if security is a concern of yours (which it is, otherwise you 
wouldn't be using SSL).

Regards,
        Jeff

VanL wrote:

> Hello,
> 
> For various reasons, I have to use https to download large (20+ MB) text
> files which I then parse.  I set up a basic function to do this using
> urllib:
> 
> response = urllib.urlretrieve(serverURL, 'run.log')
> 
> However, I then get a MemoryError.  Tracking down the source of the
> error, I see the offending function in httplib:
> 
>     def makefile(self, mode, bufsize=None):
>         """Return a readable file-like object with data from socket.
> 
>         This method offers only partial support for the makefile
>         interface of a real socket.  It only supports modes 'r' and
>         'rb' and the bufsize argument is ignored.
> 
>         The returned object contains *all* of the file data
>         """
> 
> I think the problem is that bufsize argument that is ignored.  Does
> anyone know if this is correct, and what I can do about it?  I would
> like to automate the process of downloading this file, but is it
> possible?
> 
> Thanks,
> 
> VanL




More information about the Python-list mailing list