[Python-Dev] urllib performance issue on FreeBSD 4.x
Guido van Rossum
guido@python.org
Sun, 24 Nov 2002 07:19:07 -0500
> I've been following up a thread on python-list about lousy performance of
> urllib.urlopen(...).read() on FreeBSD 4.x comparted to using wget to
> retrieve the same file.
>
> I've determined that the following patch (against 2.2.2) makes an enormous
> difference in throughput:
>
> -----8<-----8<-----8<-----
> *** Lib/httplib.py.orig Mon Oct 7 11:18:17 2002
> --- Lib/httplib.py Sun Nov 24 14:44:16 2002
> ***************
> *** 210,216 ****
> # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.
>
> def __init__(self, sock, debuglevel=0, strict=0):
> ! self.fp = sock.makefile('rb', 0)
> self.debuglevel = debuglevel
> self.strict = strict
>
> --- 210,216 ----
> # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.
>
> def __init__(self, sock, debuglevel=0, strict=0):
> ! self.fp = sock.makefile('rb', -1)
> self.debuglevel = debuglevel
> self.strict = strict
>
> -----8<-----8<-----8<-----
>
> Without this patch, d/l a 4MB file from localhost gets a bit over 110kB/s,
> with the patch gets 4-5.5MB/s on the same system (FBSD 4.4 SMP, 2xC300A,
> 128MB RAM, ATA66 HD).
>
> My question:
>
> - why is the socket.fp being set to unbuffered?
I can't make time for a full essay on the issue, but I believe that it
must be unbuffered because some applications want to read until the
end of the headers and then pass the file descriptor to a subprocess
or to code that uses the socket directly.
--Guido van Rossum (home page: http://www.python.org/~guido/)