Urllib2 urlopen and read - difference

Sun Apr 25 22:13:29 EDT 2010

In article <mailman.1917.1271357827.23598.python-list at python.org>,
J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
>On Thu, 2010-04-15 at 11:25 -0700, koranthala wrote:
>>
>>    Suppose I am doing the following:
>> req = urllib2.urlopen('http://www.python.org')
>> data = req.read()
>> 
>>    When is the actual data received? is it done by the first line? or
>> is it done only when req.read() is used?
>>   My understanding is that when urlopen is done itself, we would have
>> received all the data, and req.read() just reads it from the file
>> descriptor.
>>   But, when I read the source code of pylot, it mentioned the
>> following:
>>             resp = opener.open(request)  # this sends the HTTP request
>> and returns as soon as it is done connecting and sending
>>             connect_end_time = self.default_timer()
>>             content = resp.read()
>>             req_end_time = self.default_timer()
>> 
>> Here, it seems to suggest that the data is received only after you do
>> resp.read(), which made me all confused.
>
>My understanding (please correct me if I'm wrong), is that when you call
>open, you send a request to the server, and get a response object back.
>The server immediately begins sending data (you can't control when they
>send it, once you've requested it).  When you call read() on your
>response object, it reads all the data it has already received, and if
>that amount of data isn't sufficient to handle your read call, it blocks
>until it has enough.
>
>So your opener returns as soon as the request is sent, and read() blocks
>if it doesn't have enough data to handle your request.

Close.  urlopen() returns after it receives the HTTP header (that's why
you can get an HTTP exception on e.g. 404 without the read()).
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan