[Python-bugs-list] [ python-Bugs-511073 ] urllib problems

noreply@sourceforge.net noreply@sourceforge.net
Mon, 22 Apr 2002 06:24:19 -0700


Bugs item #511073, was opened at 2002-01-31 08:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=511073&group_id=5470

Category: Macintosh
Group: Python 2.2
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Yair Benita (ybenita)
Assigned to: Jack Jansen (jackjansen)
Summary: urllib problems

Initial Comment:
when using urllib.urlopen("url") and then reading 
the file with handle.read() i get only parts of pages. 
it works for short webpages but if i use it to 
download large pages it always come too short. To 
me it looks that it tries to read the file before it is 
downloaded. Jack Jansen's said: MacPython may 
do short reads on sockets. I've always maintained 
that this was correct (which reasoning was quietly 
accepted by everyone here), but last year I finally 
admitted that it may actually be incorrect (which 
was again quietly accepted:-)

example:
x=urllib.urlopen("http://www.ebi.ac.uk/cgi-bin/emblf
etch?db=embl&format=fasta&style=raw&id=AB002
378")
print x.read()

compare the file downloaded by any html browser 
and the file from macpython.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-04-22 15:24

Message:
Logged In: YES 
user_id=45365

This was fixed some time ago (the fix made it into 2.2.1) by modifying the underlying GUSI I/O library. Apparently I forgot to close the bug report, so I'm doing so now.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-02-06 01:34

Message:
Logged In: YES 
user_id=45365

I probably found the cause for this, now the only task remaining is finding out who to blame:-)

httplib explicitly sets non-buffering I/O on the file corresponding to the socket, by calling
self.fp = socket.makefile("rb", 0).

MSL, the CodeWarrior I/O library, has an optimization (or bug:-) that if you fread() from a binary
file with buffering turned off it will call the underlying read() straight away.

Python's fileobject.c file_read() reacts to a short fread() return value by returning.

One of these three is wrong, apparently.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=511073&group_id=5470