[Python-Dev] Re: xreadline speed vs readlines_sizehint

Mark Favas m.favas@per.dem.csiro.au
Thu, 11 Jan 2001 15:26:37 +0800


[Tim speculates on getc_unlocked and his ms_getline_hack]:
> 
> So ms_getline_hack is significantly faster on your box (I'm only
> looking at while_readline:  11 using getc_unlocked, 8.3 using 
> ms_getline_hack).  There are only two reasons I can imagine for that:
> 
> 1. Your vendor optimizes the inner loop in fgets (as all vendors
> should, but few do).

Digital engineering, Compaq management/marketing <0.6 wink>
> 
> and/or
> 
> 2. Despite the long average length of your lines, many of them are
> nevertheless shorter than 200 chars, and so all the pain
> ms_getline_hack endures to avoid a realloc pays off.
> 
> Unfortunately, there's not enough info to figure out if either, both,
> or none of those are on-target.  It's such a large percentage
> speedup, though, that my bet goes primarily to #1 -- unless realloc
> is really pig slow on your box.

The lines range in length from 96 to 747 characters, with 11% @ 233, 17%
@ 252 and 52% @ 254 characters, so #1 looks promising - most lines are
long enough to trigger a realloc. Cranking up INITBUFSIZE in
ms_getline_hack to 260 from 200 improves thing again, by another 25%: 
total 131426612 chars and 514216 lines
count_chars_lines     5.081  5.066
readlines_sizehint    3.743  3.717
using_fileinput      11.113 11.100
while_readline        6.100  6.083
for_xreadlines        3.027  3.033

Apart from the name <grin>, I like ms_getline_hack...

tho'-a-factor-of-100-makes-xreadlines-a-welcome-addition!-ly y'rs

-- 
Mark Favas  -   m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA