[Python-Dev] xreadline speed vs readlines_sizehint
Wed, 10 Jan 2001 18:06:05 -0500
> Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same
> behaviour as Tim's WinBox wrt the new xreadline and the double-loop
> readlines (so it's not just something funny with MS (not that there's
> not anything funny with MS...)):
> total 131426612 chars and 514216 lines
You average over 255 chars/line? Really? What kind of file are you
reading? I don't really want to measure the speed of line-at-a-time input
on binary files where "line" doesn't actually make sense <0.6 wink>.
> count_chars_lines 5.450 5.066
> readlines_sizehint 4.112 4.083
> using_fileinput 10.928 10.916
> while_readline 11.766 11.733
> for_xreadlines 3.569 3.533
Guido pointed out that his readlines_sizehint test forced use of a 1Mb
buffer (in the call, not only the default value). For whatever reason, that
was significantly slower than using an 8Kb sizehint on my box.
Another oddity is that while_readline is slower than using_fileinput for
you. From that I take it Python config does *not* #define
on your platform. If that's true (or esp. if it's not!), would you do me a
favor? Recompile fileobject.c with
#define'd, try the timing test again (while_readline is the most interesting
test for this), and run the test_bufio.py std test to make sure you're
actually getting the right answers.
At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack
whenever HAVE_GETC_UNLOCKED isn't available. I'd be surprised if
ms_getline_hack failed to work correctly on any platform; a bigger unknown
(to me) is whether it will yield a speedup. So far it yields a large
speedup on Windows, and looks like a speedup equal to getc_unlocked() yields
on Linux and Solaris. Info on a platform from Mars (like Tru64 Unix <wink>)
would be valuable in deciding whether to boost +0.5.
y'rs - tim