[Python-Dev] urllib performance issue on FreeBSD 4.x

Andrew MacIntyre andymac@bullseye.apana.org.au
Sun, 24 Nov 2002 14:33:59 +1000 (est)


I've been following up a thread on python-list about lousy performance of
urllib.urlopen(...).read() on FreeBSD 4.x comparted to using wget to
retrieve the same file.

I've determined that the following patch (against 2.2.2) makes an enormous
difference in throughput:

-----8<-----8<-----8<-----
*** Lib/httplib.py.orig Mon Oct  7 11:18:17 2002
--- Lib/httplib.py      Sun Nov 24 14:44:16 2002
***************
*** 210,216 ****
      # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.

      def __init__(self, sock, debuglevel=0, strict=0):
!         self.fp = sock.makefile('rb', 0)
          self.debuglevel = debuglevel
          self.strict = strict

--- 210,216 ----
      # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.

      def __init__(self, sock, debuglevel=0, strict=0):
!         self.fp = sock.makefile('rb', -1)
          self.debuglevel = debuglevel
          self.strict = strict

-----8<-----8<-----8<-----

Without this patch, d/l a 4MB file from localhost gets a bit over 110kB/s,
with the patch gets 4-5.5MB/s on the same system (FBSD 4.4 SMP, 2xC300A,
128MB RAM, ATA66 HD).

My question:

- why is the socket.fp being set to unbuffered?

I can't check the FBSD library source at the moment (and can't get to the
RFC's mentioned above either at the moment for that matter), and can only
speculate that fread() is resorting to reading from the socket a character
at a time.  So I'm not sure whether this should be treated as a FreeBSD
issue or/and a Python issue.

Another poster in the same thread mentions seeing somewhat similar
performance problems on Win2k, although not nearly as bad.

FWIW, my test script is

-----8<-----8<-----8<-----
import time
import urllib
t1 = time.time()
u = urllib.urlopen("http://localhost/big_file").read()
t2 = time.time()
print 'throughput: %f kB/s' % (len(u) / (t2 - t1))
-----8<-----8<-----8<-----

Reactions?

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia