[New-bugs-announce] [issue2601] [regression] reading from a urllib2 file descriptor happens byte-at-a-time

Tue Apr 8 23:15:30 CEST 2008

New submission from Matthias Klose <doko at debian.org>:

r61009 on the 2.5 branch

  - Bug #1389051, 1092502: fix excessively large memory allocations when
    calling .read() on a socket object wrapped with makefile(). 

causes a regression compared to 2.4.5 and 2.5.2:

When reading from urllib2 file descriptor, python will read the data a
byte at a time regardless of how much you ask for. python versions up to
2.5.2 will read the data in 8K chunks.

This has enough of a performance impact that it increases download time
for a large file over a gigabit LAN from 10 seconds to 34 minutes. (!)

Trivial/obvious example code:

  f =
urllib2.urlopen("http://launchpadlibrarian.net/13214672/nexuiz-data_2.4.orig.tar.gz")
  while 1:
    chunk = f.read()

... and then strace it to see the recv()'s chugging along, one byte at a
time.

----------
assignee: akuchling
components: Library (Lib)
messages: 65219
nosy: akuchling, doko
priority: high
severity: normal
status: open
title: [regression] reading from a urllib2 file descriptor happens byte-at-a-time
type: performance
versions: Python 2.5

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2601>
__________________________________