[Tutor] urlopen object and read() method

Christopher Smith csmith@blakeschool.org
Mon Apr 21 16:55:01 2003


I submitted the following to sourceforge regarding the failure of the
read() method on a urllib object to read until the EOF (unless it is my
failure to understand how this works):

In the 2.2.2 docs on http://python.org/doc/current/lib/module-urllib.html
it says that the object returned by urlopen supports the read()method and
that this and other methods "have the same interface as for file objects
-- see section 2.2.8".  In that section on page
http://python.org/doc/current/lib/bltin-file-objects.html it says about
the read() method that "if the size argument is negative or omitted, [read
should] read all data until EOF is reached."

I was a bit surprised when a project that students of mine were working on
were failing when they tried to process the data obtained by the read()
method on a connection made to a web page.  The problem, apparently, is
that the read may not obtain all of the data requested in the first
request and the total response has to be built up someting like follows:

import urllib
c=urllib.urlopen("http://www.blakeschool.org")
data = ''
while 1:
	packet=c.read()
	if packet == '': break
	data+=packet
	
I'm not sure if this is a feature or a bug.  Could a file's read method
fail to obtain the whole file in one read(), too?  It seems that either
the documentation should be changed or the read() method for at least
urllib objects should be changed.

/c

Christopher P. Smith
The Blake School
Minneapolis, MN