Downloading huge files via urllib
Jeff Davis
jdavis at empires.org
Tue Sep 24 05:41:30 EDT 2002
Hi,
It's certainly possible. urllib is meant as a high-level interface that
hides those kind of details. Perhaps you could find a way to make it work,
but I would suggest using the socket module and create a socket and use
SSL. Then you can read an arbitrary number of bytes from the socket at a
time. That way, you can be sure to never take in more than, say, 500 bytes
at a time, preventing your MemoryError.
Note that there is a warning in the docs for socket.ssl():
"Warning: This does not do any certificate verification! "
I think that just means that it doesn't check with a certificate authority,
so you'll have to do your own verification of the certificates involved.
But read up if security is a concern of yours (which it is, otherwise you
wouldn't be using SSL).
Regards,
Jeff
VanL wrote:
> Hello,
>
> For various reasons, I have to use https to download large (20+ MB) text
> files which I then parse. I set up a basic function to do this using
> urllib:
>
> response = urllib.urlretrieve(serverURL, 'run.log')
>
> However, I then get a MemoryError. Tracking down the source of the
> error, I see the offending function in httplib:
>
> def makefile(self, mode, bufsize=None):
> """Return a readable file-like object with data from socket.
>
> This method offers only partial support for the makefile
> interface of a real socket. It only supports modes 'r' and
> 'rb' and the bufsize argument is ignored.
>
> The returned object contains *all* of the file data
> """
>
> I think the problem is that bufsize argument that is ignored. Does
> anyone know if this is correct, and what I can do about it? I would
> like to automate the process of downloading this file, but is it
> possible?
>
> Thanks,
>
> VanL
More information about the Python-list
mailing list