[Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range
Tim Peters
tim.one@home.com
Tue, 2 Jan 2001 03:06:32 -0500
[Thomas Wouters]
> ...
> As for speed (which stays a secondary or tertiary consideration
> at best) do we really need the xreadlines method to accomplish
> that ? Couldn't fileinput get almost the same performance using
> readlines() with a sizehint ?
There was a long email discussion among Jeff, Paul Prescod, Neel
Krishnaswami, and Alex Martelli about this. I started getting copied on it
somewhere midstream, but didn't have time to follow it then (like I do now
<wink>).
About two weeks ago Neel summarized all the approaches then under
discussion:
"""
[Neel Krishnaswami]
...
Quick performance summary of the current solutions:
Slowest: for line in fileinput.input('foo'): # Time 100
: while 1: line = file.readline() # Time 75
: for line in LinesOf(open('foo')): # Time 25
Fastest: for line in file.readlines(): # Time 10
while 1: lines = file.readlines(hint) # Time 10
for line in xreadlines(file): # Time 10
The difference in speed between the slowest and fastest is about
a factor of 10.
LinesOf is Alex's Python wrapper class that takes a file and
uses readlines() with a size-hint to present a sequence interface.
It's around half as fast as the fastest idioms, and 3-4 times
faster than while 1:. Jeff's xreadlines is essentially the same
thing in C, and is indistinguishable in performance from the
other fast idioms.
...
"""
On his box, line-at-a-time is >7x slower than the fastest Python methods,
which latter are usually close (depending on the platform) to Perl
line-at-a-time speeds. A factor of 7 is too large for most working
programmers to ignore in the interest of somebody else's notion of
theoretical purity <wink>. Seriously, speed is not a secondary
consideration to me when the gap is this gross, and in an area so visible
and common.
Alex's LineOf appears a good predictor for how adding
fileinput.readlines(hint) would perform, since it appears to *be* that
(except off on its own). Then it buys a factor of 3 over line-at-a-time on
Neel's box but leaves a factor of 2.5 on the table. The cause of the latter
appears mostly to be the overhead of getting a Python method call into the
equation for each line returned.
Note that Jeff added .xreadlines() as a file object method at Neel's urging.
The way he started this is shown on the last line: a function. If we threw
out the fileinput and file method aspects, and just added a new module
xreadlines with a function xreadlines, then what? I bet it would become as
popular as the string module, and for good reason: it's a specific approach
that works, to a specific and common problem.
> ...
> And in the case of simple (x)range()es, I have yet to see a case
> where a 'real' list had significantly better performance than
> a generator.)
It varies by platform, but I don't think I've heard of variations larger
than 20% in either direction. 20% is nothing, though; in *this* case we're
talking order of magnitude. That's go/nogo territory.
> ...
> Gelukkig-Nieuwjaar-iedereen-ly y'rs
I understand people are passionate when reality clashes with the dream of a
wart-free language, but that's no reason to swear at me <wink>.
wishing-you-a-happy-new-year-like-a-civilized-man-ly y'rs - tim