[Python-Dev] xreadlines : readlines :: xrange : range

Guido van Rossum guido@python.org
Tue, 02 Jan 2001 17:46:00 -0500


> On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote:
> >into the FILE* representation in platform-dependent ways.  It's a shame that
> >almost all vendors missed that fgets was defined as a primitive by the C
> >committee precisely so that vendors *could* pull this speed trick under the
> >covers.  It's also a shame that Perl did it for them <wink>.
> 
> So, should Python be changed to use fgets(), available on all ANSI C
> platforms, rather than the glibc-specific getline()?  That would be
> more complicated than the brain-dead easy course of using getline(),
> which is obviously why I didn't do it; PyFile_GetLine() had annoyingly
> complicated logic.

You mean get_line(), which indeed has a complicated API and
corresponding logic: the argument may be a max length, or 0 to
indicate arbutrary length, or negative to indicate raw_input()
semantics. :-(

Unfortunately we can't use fgets(), even if it were faster than
getline(), because it doesn't tell how many characters it read.  On
files containing null bytes, readline() is supposed to treat these
like any other character; if your input is "abc\0def\nxyz\n", the
first readline() call should return "abc\0def\n".  But with fgets(),
you're left to look in the returned buffer for a null byte, and
there's no way (in general) to distinguish this result from an input
file that only consisted of the three characters "abc".  getline()
doesn't seem to have this problem, since its size is also an output
parameter.

> When this was discussed in comp.lang.python, someone also mentioned
> getc_unlocked(), which saves the overhead of locking the stream every
> time, but that didn't seem a fruitful avenue for exploration.

I've never heard of getc_unlocked; it's not in the (old) C standard.
If it's also a glibc thing, I doubt that using it would be faster than
getline().  If it's a new C standard (C9x) thing, we'll have to wait.

Fred reminded me that for e.g. Solaris, while everybody probably
compiles with GCC, that doesn't mean they are using glibc, so
in practice getline() will only help on Linux.

I'm slowly warming up to xreadlines(), although we must be careful to
consider the consequences (do other file-like objects need to support
it too?).

--Guido van Rossum (home page: http://www.python.org/~guido/)