[Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range

Mon, 1 Jan 2001 17:34:03 -0500

[Guido]
> But is everyone's first thought to time the speed of Python vs. Perl?

It's few peoples' first thought.  It's impossible for bilingual programmers
(or dabblers, or evaluators) not to notice *soon*, though, because:

> Why does it hurt so much that this is a bit slow?

Factors of 2 to 5 aren't "a bit" -- they're obvious when they happen, but
the *cause* is not.  To judge from a decade of c.l.py gripes, most people
write it off to "huh -- guess Python is just slow"; the rest eventually
figure out that their text input is the bottleneck (Tom Christiansen never
got this far <0.5 wink>), but then don't know what to do about it.

At this point I'm going to insert two anonymized pvt emails from last year:

-----Original Message #1 -----

From: TTT
Sent: Monday, March 13, 2000 2:29 AM
To: GGG
Subject: RE: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison

GGG, note especially figure 4 in Lutz Prechelt's report:

>   http://wwwipd.ira.uka.de/~prechelt/Biblio/#jccpprtTR

The submitted Python programs had by far the largest variability in how long
it took to load the dictionary.  My input loop is probably typical of the
"fast" Python programs, which indeed beat most (but not all) of the fastest
Perl ones here:

class Dictionary:
    ...

    def fill_from_file(self, f, BUFFERSIZE=500000):
        """f, BUFFERSIZE=500000 -> fill dictionary from file f.

        f must be an open file, or other object with a readlines()
        method.  It must contain one word per line.  Optional arg
        BUFFERSIZE is used to chunk up input for efficiency, and is
        roughly the # of bytes read at a time.
        """

        addword = self.addword
        while 1:
            lines = f.readlines(BUFFERSIZE)
            if not lines:
                break
            for line in lines:
                addword(line[:-1])  # chop trailing newline

Comparable Perl may have been the one-liner:

    grep(&addword, chomp(<>));

which may account for why Perl's memory use was uniformly higher than
Python's.

Whatever, you really need to be a Python expert to dream up "the fast way"
to do Python input!  Hire me, and I'll fix that <wink>.

nothing-like-blackmail-before-going-to-bed-ly y'rs  - TTT

-----Original Message #2 -----

From: GGG
Sent: Monday, March 13, 2000 7:08 AM
To: TTT
Subject: Re: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison

Agreed.  readlines(BUFFERSIZE) is a crock.  In fact, ``for i in
f.readlines()'' should use lazy evaluation -- but that will have to wait for
Py3K unless we add hints so that readlines knows it is being called from a
for loop.

--GGG

-----Back to 2001 -----

I took TTT's advice and read Lutz's report <wink>.  I agree with GGG that
hiding this in .readlines() would be maximally elegant.  xreadlines supplies
most of the lazy machinery GGG favored.  I don't know how hard it would be
to supply the rest of it, but it's such a frequent bitching point that I
would prefer pointing people to an explicit .xreadlines() hack than either
(a) try to convince them that they "shouldn't" care about the speed as much
as they claim to; or, (b) try to explain the double-loop buffering method.
I'd personally rather use an explicit .xreadlines() hack than code the
double-loop buffering too, and don't see an obvious way to do better than
that right now.

>> reading-text-files-is-very-common-ly y'rs  - tim

> So is worrying about performance without a good reason...

Indeed it is.  I'm persuaded that many people making this specific complaint
have a legitimate need for more speed, though, and that many don't persist
with Python long enough to find out how to address this complaint (because
the double-loop method is too obscure for a newbie to dream up).  That makes
this hack score extraordinarily high on my benefit/harm ratio scale (in P3K
xreadlines can be deprecated in favor of readlines <0.9 wink>).

heck-it-doesn't-even-require-a-new-keyword-ly y'rs  - tim