[Pythonmac-SIG] Re: MacPython and line-endings

Jack Jansen jack@oratrix.nl
Fri, 12 Oct 2001 22:58:05 +0200


Recently, Guido van Rossum <guido@python.org> said:
> > That's really the main reason for my original impression that this
> > should be done at stdio level: all the logic that is in import.c (and
> > importdl.c, and probably in the source command, etc) knows nothing
> > about Python objects and uses normal C conventions. So a Python based
> > solution (or a solution on the level of fileobjects) won't cut it if
> > you want this to work also for scripts and modules that are imported.
> 
> Aha.  But here we have a subset of the original problem: there's no
> interactive reading, and no seeking.

Ok, here's my strawman solution.

char *Py_UniversalNewlineFgets(char *, int, FILE*, PyObject *);
size_t Py_UniversalNewlineFread(void *, size_t, size_t, FILE *, PyObject *);

The last argument can be either NULL or a PyObject, the FILE* should
be opened in binary mode.

If the routines encounter a \r they do one of two things. If the last
argument is NULL they read the next char and either gobble it up if
it's a \n or ungetc() it. If the FileObject is available they don't
peek ahead but they set the skipinitialnewline flag in the file object
in stead. Of course they also honour that flag in this case.

Parser.c and the import code and any other interested party can use
these routines to get universal newline support.

fileobject.c always calls these routines with the object
parameter. file.seek() also clears the flag, as does writing.

BUT: if a FILE* is ever fseek()ed without our knowledge we are
hosed. We could provide a Py_UniversalNewlineFseek(..., PyObject *)
but if the FILE* from a FileObject is passed to, say, a third party
library we're still hosed. I thought of storing the ftell() value in
the skipinitialnewline flag (with -1 meaning "don't skip") but that
won't fly because of interactive input. I think we'll just have to
live with this, all fseek()s outside fileobject.c are on binary files,
AFAICT.

By having state in the FileObject we could add another feature I would
like very much (but feel free to accuse me of feeping creaturism and
shoot it down): a flag indicating what the newline convention was so
far. The values would be unknown, cr, lf, crlf and
mixed. (implementations of these values could well be 0, 1, 2, 4 and
anything else:-). As was noted here previously Mac text editors tend
to be "newline convention preserving" and this flag would make that
easy to implement.

Sigh, it looks like I've done a large part of the PEP already:-(
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm