[Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range

Tue, 2 Jan 2001 03:06:32 -0500

[Thomas Wouters]
> ...
> As for speed (which stays a secondary or tertiary consideration
> at best) do we really need the xreadlines method to accomplish
> that ?  Couldn't fileinput get almost the same performance using
> readlines() with a sizehint ?

There was a long email discussion among Jeff, Paul Prescod, Neel
Krishnaswami, and Alex Martelli about this.  I started getting copied on it
somewhere midstream, but didn't have time to follow it then (like I do now
<wink>).

About two weeks ago Neel summarized all the approaches then under
discussion:

"""
[Neel Krishnaswami]

...

Quick performance summary of the current solutions:

Slowest: for line in fileinput.input('foo'):     # Time 100
       : while 1: line = file.readline()         # Time 75
       : for line in LinesOf(open('foo')):       # Time 25
Fastest: for line in file.readlines():           # Time 10
         while 1: lines = file.readlines(hint)   # Time 10
         for line in xreadlines(file):           # Time 10

The difference in speed between the slowest and fastest is about
a factor of 10.

LinesOf is Alex's Python wrapper class that takes a file and
uses readlines() with a size-hint to present a sequence interface.
It's around half as fast as the fastest idioms, and 3-4 times
faster than while 1:. Jeff's xreadlines is essentially the same
thing in C, and is indistinguishable in performance from the
other fast idioms.

...

"""

On his box, line-at-a-time is >7x slower than the fastest Python methods,
which latter are usually close (depending on the platform) to Perl
line-at-a-time speeds.  A factor of 7 is too large for most working
programmers to ignore in the interest of somebody else's notion of
theoretical purity <wink>.  Seriously, speed is not a secondary
consideration to me when the gap is this gross, and in an area so visible
and common.

Alex's LineOf appears a good predictor for how adding
fileinput.readlines(hint) would perform, since it appears to *be* that
(except off on its own).  Then it buys a factor of 3 over line-at-a-time on
Neel's box but leaves a factor of 2.5 on the table.  The cause of the latter
appears mostly to be the overhead of getting a Python method call into the
equation for each line returned.

Note that Jeff added .xreadlines() as a file object method at Neel's urging.
The way he started this is shown on the last line:  a function.  If we threw
out the fileinput and file method aspects, and just added a new module
xreadlines with a function xreadlines, then what?  I bet it would become as
popular as the string module, and for good reason:  it's a specific approach
that works, to a specific and common problem.

> ...
> And in the case of simple (x)range()es, I have yet to see a case
> where a 'real' list had significantly better performance than
> a generator.)

It varies by platform, but I don't think I've heard of variations larger
than 20% in either direction.  20% is nothing, though; in *this* case we're
talking order of magnitude.  That's go/nogo territory.

> ...
> Gelukkig-Nieuwjaar-iedereen-ly y'rs

I understand people are passionate when reality clashes with the dream of a
wart-free language, but that's no reason to swear at me <wink>.

wishing-you-a-happy-new-year-like-a-civilized-man-ly y'rs  - tim