[ python-Bugs-1016880 ] urllib.urlretrieve silently truncates
downloads
SourceForge.net
noreply at sourceforge.net
Fri Dec 24 15:30:05 CET 2004
Bugs item #1016880, was opened at 2004-08-26 15:58
Message generated for change (Comment added) made by irmen
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1016880&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 6
Submitted By: David Abrahams (david_abrahams)
Assigned to: Johannes Gijsbers (jlgijsbers)
Summary: urllib.urlretrieve silently truncates downloads
Initial Comment:
The following script appears to be unreliable on all
versions of Python we can find. The file being
downloaded is approximately 34 MB. Browsers such as
IE and Mozilla have no problem downloading the whole
thing.
----
import urllib
import os
os.chdir('/tmp')
urllib.urlretrieve
('http://cvs.sourceforge.net/cvstarballs/boost-
cvsroot.tar.bz2',
'boost-cvsroot.tar.bz2')
----------------------------------------------------------------------
Comment By: Irmen de Jong (irmen)
Date: 2004-12-24 15:30
Message:
Logged In: YES
user_id=129426
Suggested addition to the doc of urllib (liburllib.tex, if
I'm not mistaken):
"""
urlretrieve will raise IOError when it detects that the
amount of data available
was less than the expected amount (which is the size
reported by a Content-Length
header). This can occur, for example, when the download is
interrupted.
The Content-Length is treated as a lower bound (just like
tools such as wget and
Ffirefox appear to do): if there's more data to read,
urlretrieve reads more data, but
if less data is available, it raises IOError.
If no Content-Length header was supplied, urlretrieve can
not check the size
of the data it has downloaded, and just returns it. In this
case you
just have to assume that the download was successful.
"""
----------------------------------------------------------------------
Comment By: Irmen de Jong (irmen)
Date: 2004-11-07 21:17
Message:
Logged In: YES
user_id=129426
a patch is at 1062060 (raises IOError when download is
incomplete)
----------------------------------------------------------------------
Comment By: Irmen de Jong (irmen)
Date: 2004-11-07 20:47
Message:
Logged In: YES
user_id=129426
Confirmed here (mandrakelinux 10.0, python 2.4b2)
However, I doubt it is a problem in urllib.urlretrieve,
because I tried downloading the file with wget, and got the
following:
[irmen at isengard tmp]$ wget -S
http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2
--20:38:11--
http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2
=> `boost-cvsroot.tar.bz2.1'
Resolving cvs.sourceforge.net... 66.35.250.207
Connecting to cvs.sourceforge.net[66.35.250.207]:80...
connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Date: Sun, 07 Nov 2004 19:38:15 GMT
3 Server: Apache/2.0.40 (Red Hat Linux)
4 Last-Modified: Sat, 06 Nov 2004 15:11:39 GMT
5 ETag: "b63d5b-25c3808-687d80c0"
6 Accept-Ranges: bytes
7 Content-Length: 39598088
8 Content-Type: application/x-bzip2
9 Connection: close
31% [=======================>
] 12,665,616 60.78K/s ETA 03:55
20:40:07 (111.60 KB/s) - Connection closed at byte 12665616.
Retrying.
--20:40:08--
http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2
(try: 2) => `boost-cvsroot.tar.bz2.1'
Connecting to cvs.sourceforge.net[66.35.250.207]:80...
connected.
HTTP request sent, awaiting response...
....... so the remote server just closed the connection
halfway trough! I suspect that a succesful download is sheer
luck.
Also, the download loop in urllib looks fine to me. It only
stops when the read() returns an empty result, and that
means EOF.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-26 22:04
Message:
Logged In: YES
user_id=80475
Followed the same procedure (no chdir, add a hook) but
bombed out at 9.1Mb:
. . .
(1117, 8192, 34520156)
('boost-cvsroot.tar.bz2', <httplib.HTTPMessage instance at
0x00B1E4B8>)
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-08-26 20:52
Message:
Logged In: YES
user_id=31435
Hmm. I don't know anything about this, but thought I'd just
try it. Didn't chdir(), did add a reporthook:
def hook(*args):
print args
WinXP Pro SP1, current CVS Python, cable modem over a
wireless router. Output looked like this:
(0, 8192, 34520156)
(1, 8192, 34520156)
(2, 8192, 34520156)
...
(4213, 8192, 34520156)
(4214, 8192, 34520156)
(4215, 8192, 34520156)
Had the whole file when it ended:
> wc boost-cvsroot.tar.bz2
125368 765656 34520156 boost-cvsroot.tar.bz2
*Maybe* adding the reporthook changed timing in some
crucial way. Don't know.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-26 19:09
Message:
Logged In: YES
user_id=80475
Confirmed. On Py2.4 (current CVS), I got 12.7 Mb before the
connection closed.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1016880&group_id=5470
More information about the Python-bugs-list
mailing list