Urllib2 urlopen and read - difference
aahz at pythoncraft.com
Mon Apr 26 04:13:29 CEST 2010
In article <mailman.1917.1271357827.23598.python-list at python.org>,
J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
>On Thu, 2010-04-15 at 11:25 -0700, koranthala wrote:
>> Suppose I am doing the following:
>> req = urllib2.urlopen('http://www.python.org')
>> data = req.read()
>> When is the actual data received? is it done by the first line? or
>> is it done only when req.read() is used?
>> My understanding is that when urlopen is done itself, we would have
>> received all the data, and req.read() just reads it from the file
>> But, when I read the source code of pylot, it mentioned the
>> resp = opener.open(request) # this sends the HTTP request
>> and returns as soon as it is done connecting and sending
>> connect_end_time = self.default_timer()
>> content = resp.read()
>> req_end_time = self.default_timer()
>> Here, it seems to suggest that the data is received only after you do
>> resp.read(), which made me all confused.
>My understanding (please correct me if I'm wrong), is that when you call
>open, you send a request to the server, and get a response object back.
>The server immediately begins sending data (you can't control when they
>send it, once you've requested it). When you call read() on your
>response object, it reads all the data it has already received, and if
>that amount of data isn't sufficient to handle your read call, it blocks
>until it has enough.
>So your opener returns as soon as the request is sent, and read() blocks
>if it doesn't have enough data to handle your request.
Close. urlopen() returns after it receives the HTTP header (that's why
you can get an HTTP exception on e.g. 404 without the read()).
Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/
"It is easier to optimize correct code than to correct optimized code."
More information about the Python-list