[Python-Dev] RE: fileinput.py

Tim Peters tim.one@home.com
Fri, 5 Jan 2001 15:46:10 -0500

[Skip Montanaro]
> What do you think contributes to fileinput's relative disfavor?

Only half jokingly, because I never use it <wink>, and I don't think Fredrik
or Alex Martelli do either.  That means it rarely gets mentioned by the
c.l.py reply bots.  Plus it's not *used* anywhere in the Python
distribution, so nobody stumbles into it that way either.  Plus the docs
require more than one line to explain what it does, and get bogged down
describing the Awk-like (Perl took this from Awk) convolutions before the
simplest (one explictly named file) case.  It *is* regularly mentioned in
the eternal "while 1:" debate, but that's it.

> This whole thread on Python's file reading performance was started
> by the eternal whine "why is Python so much slower than Perl?"

No, it started with Guido's objections to Jeff's xreadlines patch.  I
dragged Perl into it -- because, like it or not, that was the right thing to
do <wink>.

> which really means why is
>    line = f.readline()
>    while line:
>       process(line)
> so much slower than whatever that thing is in Perl that everybody
> uses as the be-all-end-all performance benchmark (something with
> <> in it).

"<FILE>" is simply Perl's way of spelling Python's FILE.readline() (and
FILE.readlines(), when <FILE> appears in an array context; and FILE.read()
when Perl's Awkish "record separator" is disabled; and ...).  "<>" without
an explict filehandle does all the inherited-from-Awk magic with argv, else
that stuff doesn't come into play.   "<>" (wihtout a filehandle) seems
rarely used in Perl practice, though, *except* in support of

your_shell_prompt> some_perl_script < some_file

That is, "<>" is usually used simply as an abbrevision for <STDIN>, and I
bet *most* Perl programmers don't even know "<>" is more general than that.

> Given that fileinput is supposed to make the I/O loop in Python more
> familiar to those people wandering over from Perl (at least in part),
> you'd think that people would naturally gravitate to it.

I guess you didn't actually read the timing results <wink>.  Really, it's
been an outrageously slow way to do input.  That's better now, and I'm much
more likely now than I used to be to use

    for line in fileinput.input('file'):

instead of

    f = open('file')
    while 1:
        line = f.readline()
        if not line:

The relative attraction of the former is obvious if it's reasonably quick.
I don't really have any use for the Awk complications (note that I'm running
on Windows, though, and the shells here don't expand wildcards -- the Awk
gimmicks are much more useful on Unix systems).

> Would it benefit from some exposure in the Python tutorial?

Heh -- that's a tough one.  The *simplest* case is the only one deserving of
promotion.  But in that case, Jeff's xreadlines is about as convenient and
much quicker.  I bet we'll all be afraid to change the tutorial to mention
either <0.9 wink>.

> Is it fast enough now to warrant the extra exposure?

Don't know.  It's the same speed as "while 1: on *my* box now, but still 3x
slower than the double-loop method.

> just-whining-out-loud-ly y'rs

so-do-*you*-want-to-use-it-now?-ly y'rs  - tim