[Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range

Tim Peters tim.one@home.com
Wed, 3 Jan 2001 04:32:55 -0500


[Guido & Tim, wonder about faking getline-like functionality for Windows]

The attached is kinda baffling.  The std tests pass with it, and it changes my test
timings from:

count_chars_lines    14.780 14.772
readlines_sizehint    9.390  9.375
using_fileinput      66.130 66.157
while_readline       30.380 30.337

to:

count_chars_lines    14.880 14.854
readlines_sizehint    9.280  9.302
using_fileinput      48.610 48.589
while_readline       13.450 13.451

Big win?  You bet.  But ...

The baffling parts:

1. That Perl still takes only 6 seconds in line-at-a-time mode.

2. I originally wrote a getline workalike, instead of building directly into a PyString
buffer.  That made my test run *slower*, and I'm talking factor of 2, not a yawn.  To
judge from my usually silent disk (I've got 256Mb RAM on this box), I'm afraid the extra
mallocs required may have triggered the horrid Win9x malloc-thrashing problem I wrote
about while I was still at Dragon.  Consider that another vote for Vlad's PyMalloc --
we've got no handle on x-platform dynamic memory behavior now.  Python's destiny is to
replace both the platform OS and libc anyway <0.9 wink>.

The scary parts:

+ As the "XXX" comments indicate, this is full of little insecurities.

+ Another one I just thought of:  if the user's last operations on the fp were two or more
consecutive ungetc calls, all bets are off.  But then MS doesn't define what happens then
either.

+ This is much less ambitious than I recall Perl's code being:  it doesn't try to guess
anything about the file, and effectively captures only what would happen if you could
unroll the guts of a getc-in-a-loop and optimize the snot out of it.  The good news is
that this means it's much easier to maintain (it touches only two of the MS FILE* fields,
and in ways that are pretty obviously correct).  The bad news is that this seems also
pretty clearly all there *is* to be gotten out of breaking into the FILE* abstraction for
the particular test case I'm using; and increasing TUNEME doesn't save any time at all:
the sucker is flying at full speed already.

+ It drops (line-at-a-time) drops to a little under 13 seconds if I comment out the thread
macros.

+ I haven't looked at Perl's implementation in a year, and they must have dreamt up
another trick since then.  That's a "scary part" indeed to anyone who has ever looked at
Perl's implementation.

retreating-into-a-fetal-position-ly y'rs  - tim


Anyone wants to play, the sandbox is fileobject.c.  Do two things:  insert this new chunk
somewhere above get_line:

#ifdef MS_WIN32
static PyObject*
win32_getline(FILE *fp)
{
	/* XXX ignores thread safety -- but so does MS's getc macro! */
	PyObject* v;
	char* pBuf;	/* next free slot in v's buffer */
	/* MS's internals are declared in terms of ints, but it's a sure bet
	 * that won't last forever -- use size_t now & live w/ the casting;
	 * ditto for Python's routines
	 */
	size_t total_buf_size = 100;
	size_t free_buf_size = total_buf_size;
#define TUNEME 1000	/* how much to boost the string buffer when exhausted */

	v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size);
	if (v == NULL)
		return NULL;
	pBuf = BUF(v);
	Py_BEGIN_ALLOW_THREADS
	for (;;) {
		char ch;
		size_t ms_cnt;	/* FILE->_cnt shadow */
		char* ms_ptr;	/* FILE->_ptr shadow */
		size_t max_to_copy, i;
		/* stdio buffer empty or in unknown state; rather
		 * than try to simulate every quirk of MS's internals,
		 * let the MS macros deal with it.
		 */
		/* XXX we also wind up here when we simply run out of string
		 * XXX buffer space, but I'm not sure I care:  making this a
		 * XXX double-nested loop doesn't seem worth it
		 */
		ch = getc(fp);
		if (ch == EOF)
			break;
		/* make sure we've got some breathing room */
		if (free_buf_size < 100) {
			size_t currentoffset = pBuf - BUF(v);
			total_buf_size += TUNEME;  /* XXX check for overflow */
			Py_BLOCK_THREADS
			if (_PyString_Resize(&v, (int)total_buf_size) < 0)
				return NULL;
			Py_UNBLOCK_THREADS
			pBuf = BUF(v) + currentoffset;
			free_buf_size = TUNEME;
		}
		/* ch wasn't EOF, so store it */
		*pBuf++ = ch;
		--free_buf_size;
		if (ch == '\n') {
			break;
		}
		ms_cnt = (size_t)fp->_cnt;
		if (!ms_cnt) {
			/* XXX this is a slow way to read one character at
			 * XXX a time if, e.g., the stream is unbuffered
			 */
			continue;
		}
		/* payback!  now we don't have to check for buffer overflows or
		 * EOF inside the loop, nor does the macro _filbuf() branch force
		 *  _ptr and _cnt in and out of memory on each iteration
		 */
		ms_ptr = fp->_ptr;
		assert(ms_cnt > 0);
		i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size;
		do {
			/* XXX unclear to me why MS's getc macro does "& 0xff" */
			*pBuf++ = ch = *ms_ptr++ & 0xff;
		} while (--i && ch != '\n');
		/* update the shadows & counters */
		fp->_ptr = ms_ptr;
		free_buf_size -= max_to_copy - i;
		fp->_cnt = ms_cnt - (max_to_copy - i);
		if (ch == '\n')
			break;
	}
	Py_END_ALLOW_THREADS
	_PyString_Resize(&v, pBuf - BUF(v));
	return v;
}
#endif

2. Within get_line, add this before the #endif (this is the getline #if block):

#elif defined(MS_WIN32)
	if (n == 0) {
		return win32_getline(fp);
	}