[Python-Dev] xreadline speed vs readlines_sizehint

Tim Peters tim.one@home.com
Wed, 10 Jan 2001 18:06:05 -0500


[Mark Favas]
> Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same
> behaviour as Tim's WinBox wrt the new xreadline and the double-loop
> readlines (so it's not just something funny with MS (not that there's
> not anything funny with MS...)):
>
> total 131426612 chars and 514216 lines

You average over 255 chars/line?  Really?  What kind of file are you
reading?  I don't really want to measure the speed of line-at-a-time input
on binary files where "line" doesn't actually make sense <0.6 wink>.

> count_chars_lines     5.450  5.066
> readlines_sizehint    4.112  4.083
> using_fileinput      10.928 10.916
> while_readline       11.766 11.733
> for_xreadlines        3.569  3.533

Guido pointed out that his readlines_sizehint test forced use of a 1Mb
buffer (in the call, not only the default value).  For whatever reason, that
was significantly slower than using an 8Kb sizehint on my box.

Another oddity is that while_readline is slower than using_fileinput for
you.  From that I take it Python config does *not* #define

     HAVE_GETC_UNLOCKED

on your platform.  If that's true (or esp. if it's not!), would you do me a
favor?  Recompile fileobject.c with

     USE_MS_GETLINE_HACK

#define'd, try the timing test again (while_readline is the most interesting
test for this), and run the test_bufio.py std test to make sure you're
actually getting the right answers.

At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack
whenever HAVE_GETC_UNLOCKED isn't available.  I'd be surprised if
ms_getline_hack failed to work correctly on any platform; a bigger unknown
(to me) is whether it will yield a speedup.  So far it yields a large
speedup on Windows, and looks like a speedup equal to getc_unlocked() yields
on Linux and Solaris.  Info on a platform from Mars (like Tru64 Unix <wink>)
would be valuable in deciding whether to boost +0.5.

don't-want-your-python-to-run-slower-than-possible-if-possible-ly
    y'rs  - tim