[Python-Dev] Re: xreadline speed vs readlines_sizehint
Mark Favas
m.favas@per.dem.csiro.au
Thu, 11 Jan 2001 15:26:37 +0800
[Tim speculates on getc_unlocked and his ms_getline_hack]:
>
> So ms_getline_hack is significantly faster on your box (I'm only
> looking at while_readline: 11 using getc_unlocked, 8.3 using
> ms_getline_hack). There are only two reasons I can imagine for that:
>
> 1. Your vendor optimizes the inner loop in fgets (as all vendors
> should, but few do).
Digital engineering, Compaq management/marketing <0.6 wink>
>
> and/or
>
> 2. Despite the long average length of your lines, many of them are
> nevertheless shorter than 200 chars, and so all the pain
> ms_getline_hack endures to avoid a realloc pays off.
>
> Unfortunately, there's not enough info to figure out if either, both,
> or none of those are on-target. It's such a large percentage
> speedup, though, that my bet goes primarily to #1 -- unless realloc
> is really pig slow on your box.
The lines range in length from 96 to 747 characters, with 11% @ 233, 17%
@ 252 and 52% @ 254 characters, so #1 looks promising - most lines are
long enough to trigger a realloc. Cranking up INITBUFSIZE in
ms_getline_hack to 260 from 200 improves thing again, by another 25%:
total 131426612 chars and 514216 lines
count_chars_lines 5.081 5.066
readlines_sizehint 3.743 3.717
using_fileinput 11.113 11.100
while_readline 6.100 6.083
for_xreadlines 3.027 3.033
Apart from the name <grin>, I like ms_getline_hack...
tho'-a-factor-of-100-makes-xreadlines-a-welcome-addition!-ly y'rs
--
Mark Favas - m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA