[Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range

Tim Peters tim.one@home.com
Fri, 12 Jan 2001 01:54:47 -0500

[Tim, on for_xreadlines vs readlines_sizehint, after disabling the
 default 1Mb buffer size in the latter]
> They're indistinguishable then on my box (on one run xreadlines
> is .1 seconds  (out of around 7.6 total) quicker, on another
> readlines_sizehint), *provided* that I specify the same buffer
> size (8192) that xreadlines uses internally.  However, if I even
> double that, readlines_sizehint is uniformly about 10% slower.  It's
> also a tiny bit slower if I cut the sizehint buffer size to 4096.

> 8192 happens to be the size of the stack-allocated buffer readlines()
> uses, and also the stdio BUFSIZ parameter, on many systems.  Look for
> SMALLCHUNK in fileobject.c.
> Would it make sense to tie the two constants together more to tune
> this optimally even when BUFSIZ is different?

Have to repeat what I first said:

> I'm afraid Mysteries will remain no matter how many
> person-decades we spend staring at this <0.5 wink> ...

I'm repeating that because BUFSIZ is 4096 on WinTel, but SMALLCHUNK (8192)
worked best for me.  Now we're in some complex balancing act among how often
the outer loop needs to refill the readlines_sizehint buffer;, how out of
whack the latter is with the platform stdio buffer; whether platform malloc
takes only twice as long to allocate space for 2*N strings as for N; and, if
the readlines buffer is too large, at exactly which point the known Win9x
eventually-quadratic-time behavior of PyList_Append starts to kick in.  I
can't out-think all that.  Indeed, I can't out-think any of it <frown>.

After staring at the code, I expect my "only a tiny bit slower" was an
illusion:  if 0 < sizehint <= SMALLCHUNK, sizehint appears to have no effect
on the operation on file_readline.

BTW, changing fileobject.c's SMALLCHUNK to a copy of BUFSIZ didn't make any
difference on Windows.