Urllib2 urlopen and read - difference
Aahz
aahz at pythoncraft.com
Sun Apr 25 22:13:29 EDT 2010
In article <mailman.1917.1271357827.23598.python-list at python.org>,
J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
>On Thu, 2010-04-15 at 11:25 -0700, koranthala wrote:
>>
>> Suppose I am doing the following:
>> req = urllib2.urlopen('http://www.python.org')
>> data = req.read()
>>
>> When is the actual data received? is it done by the first line? or
>> is it done only when req.read() is used?
>> My understanding is that when urlopen is done itself, we would have
>> received all the data, and req.read() just reads it from the file
>> descriptor.
>> But, when I read the source code of pylot, it mentioned the
>> following:
>> resp = opener.open(request) # this sends the HTTP request
>> and returns as soon as it is done connecting and sending
>> connect_end_time = self.default_timer()
>> content = resp.read()
>> req_end_time = self.default_timer()
>>
>> Here, it seems to suggest that the data is received only after you do
>> resp.read(), which made me all confused.
>
>My understanding (please correct me if I'm wrong), is that when you call
>open, you send a request to the server, and get a response object back.
>The server immediately begins sending data (you can't control when they
>send it, once you've requested it). When you call read() on your
>response object, it reads all the data it has already received, and if
>that amount of data isn't sufficient to handle your read call, it blocks
>until it has enough.
>
>So your opener returns as soon as the request is sent, and read() blocks
>if it doesn't have enough data to handle your request.
Close. urlopen() returns after it receives the HTTP header (that's why
you can get an HTTP exception on e.g. 404 without the read()).
--
Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/
"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan
More information about the Python-list
mailing list