[Python-bugs-list] [ python-Bugs-508157 ] urllib.urlopen results.readline is slow
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 14 Mar 2002 15:32:35 -0800
Bugs item #508157, was opened at 2002-01-24 16:48
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=508157&group_id=5470
Category: Python Library
Group: Python 2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Keith Davidson (kbdavidson)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.urlopen results.readline is slow
Initial Comment:
The socket file object underlying the return from
urllib.urlopen() is opened without any buffering
resulting in very slow performance of results.readline
(). The specific problem is in the
httplib.HTTPResponse constructor. It calls
sock.makefile() with a 0 for the buffer size. Forcing
the buffer size to 4096 results in the time for
calling readline() on a 60K character line to go from
16 seconds to .27 seconds (there is other processing
going on here but the magnitude of the difference is
correct).
I am using Python 2.0 so I can not submit a patch
easily but the problem appears to still be present in
the 2.2 source. The specific change is to change the
0 in sock.makefile() to 4096 or some other reasonable
buffer size:
class HTTPResponse:
def __init__(self, sock, debuglevel=0):
self.fp = sock.makefile('rb', 0) <= change
to 4096
self.debuglevel = debuglevel
----------------------------------------------------------------------
>Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-14 18:32
Message:
Logged In: YES
user_id=11375
Greg Stein originally wrote it; I'll ping him.
I suspect it might be because of
HTTP pipelining; if multiple
responses will be returned over a socket, you
probably can't use buffering because the buffer might consume the end of
response #1 and the start of response #2.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-25 09:12
Message:
Logged In: YES
user_id=6380
I wonder why the author explicitly turned off buffering.
There probably was a reason? Without knowing why, we can't
just change it.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-01-24 16:54
Message:
Logged In: NO
What platform?
--Guido (not logged in)
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=508157&group_id=5470