[Python-Dev] xreadline speed vs readlines_sizehint
Mark Favas
m.favas@per.dem.csiro.au
Thu, 11 Jan 2001 11:40:18 +0800
[Tim responded]
>>
>> total 131426612 chars and 514216 lines
>You average over 255 chars/line? Really? What kind of file are you
>reading? I don't really want to measure the speed of line-at-a-time >input on binary files where "line" doesn't actually make sense <0.6 wink>.
Real-life input, my boy! It's actually a syslog from my mailserver,
consisting mainly of sendmail log messages, and I have a current need to
process these things (MS Exchange, corrupted database, clobbered backup
tapes), so this thread came along at the right time...
>Guido pointed out that his readlines_sizehint test forced use of a 1Mb
>buffer (in the call, not only the default value). For whatever >reason, that was significantly slower than using an 8Kb sizehint on my >box.
Removing the buffer size arg in the call to readlines_sizehint results
in this (using up-to-the-minute CVS):
total 131426612 chars and 514216 lines
count_chars_lines 4.922 4.916
readlines_sizehint 3.881 3.850
using_fileinput 10.371 10.366
while_readline 10.943 10.916
for_xreadlines 2.990 2.967
and with an 8Kb sizehint:
total 131426612 chars and 514216 lines
count_chars_lines 5.241 5.216
readlines_sizehint 2.917 2.900
using_fileinput 10.351 10.333
while_readline 10.990 10.983
for_xreadlines 2.877 2.867
>Another oddity is that while_readline is slower than using_fileinput >for you. From that I take it Python config does *not* #define
>
> HAVE_GETC_UNLOCKED
>
>on your platform. If that's true
Nope, HAVE_GETC_UNLOCKED is indeed #define'd
>(or esp. if it's not!), would you do me a
>favor? Recompile fileobject.c with
>
> USE_MS_GETLINE_HACK
>
>#define'd, try the timing test again (while_readline is the most >interesting test for this), and run the test_bufio.py std test to make >sure you're actually getting the right answers.
Sure:
With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd (although
defining the former makes the latter def irrelevant):
(test_bufio also OK)
total 131426612 chars and 514216 lines
count_chars_lines 5.056 5.050
readlines_sizehint 3.771 3.667
using_fileinput 11.128 11.116
while_readline 8.287 8.233
for_xreadlines 3.090 3.083
With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just for
completeness):
total 131426612 chars and 514216 lines
count_chars_lines 4.916 4.900
readlines_sizehint 3.875 3.867
using_fileinput 14.404 14.383
while_readline 322.728 321.837
for_xreadlines 7.113 7.100
So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement
<grin>
--
Mark Favas - m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA