[Python-Dev] urllib performance issue on FreeBSD 4.x

Guido van Rossum guido@python.org
Sun, 24 Nov 2002 07:19:07 -0500


> I've been following up a thread on python-list about lousy performance of
> urllib.urlopen(...).read() on FreeBSD 4.x comparted to using wget to
> retrieve the same file.
> 
> I've determined that the following patch (against 2.2.2) makes an enormous
> difference in throughput:
> 
> -----8<-----8<-----8<-----
> *** Lib/httplib.py.orig Mon Oct  7 11:18:17 2002
> --- Lib/httplib.py      Sun Nov 24 14:44:16 2002
> ***************
> *** 210,216 ****
>       # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.
> 
>       def __init__(self, sock, debuglevel=0, strict=0):
> !         self.fp = sock.makefile('rb', 0)
>           self.debuglevel = debuglevel
>           self.strict = strict
> 
> --- 210,216 ----
>       # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.
> 
>       def __init__(self, sock, debuglevel=0, strict=0):
> !         self.fp = sock.makefile('rb', -1)
>           self.debuglevel = debuglevel
>           self.strict = strict
> 
> -----8<-----8<-----8<-----
> 
> Without this patch, d/l a 4MB file from localhost gets a bit over 110kB/s,
> with the patch gets 4-5.5MB/s on the same system (FBSD 4.4 SMP, 2xC300A,
> 128MB RAM, ATA66 HD).
> 
> My question:
> 
> - why is the socket.fp being set to unbuffered?

I can't make time for a full essay on the issue, but I believe that it
must be unbuffered because some applications want to read until the
end of the headers and then pass the file descriptor to a subprocess
or to code that uses the socket directly.

--Guido van Rossum (home page: http://www.python.org/~guido/)