[Pythonmac-SIG] Re: MacPython and line-endings

Jack Jansen jack@oratrix.nl
Sat, 06 Oct 2001 17:11:32 +0200


Recently, Chris Barker <chrishbarker@home.net> said:
> > - I don't like all the various ways to specify line endings (with
> >   'mac', 'Mac' and '\r' all being equivalent), I think I'd go with a
> >   simple '\r' (maybe with a symbolic constant mac='\r' somewhere).
> 
> I wanted the interface to be easy to use. I suppose if you are a
> programmer, you should know what a Mac line ending looks like, but that
> may not fit with CP4E. If you were to define a constant somewhere, where
> would you define it???

Hmm. I agree with your thinking that newbies won't know about
\r\n. But on the other hand newbees wil never use the parameter in the
first place: they'll read files with any line ending and wite files
with the local lineending. But this isn't really all that important a
point, feel free to do what you like.

> > - Interactive input is important, and the lookahead for \n should be
> >   tackled. I think the way to do it is different than what Guido
> >   suggests, I think that instead of peeking ahead for a \n if you see
> >   a \r you should set a flag (self.return_seen_skip_initial_newline)
> >   that will eat the newline upon the next read/readline. But: this has
> >   implications for tell(), as tell() _will_ have to do the peek to
> >   return the correct position for the beginning of the next
> >   line. And seek() should reset the flag.
> Also read().
> 
> There just isn't a simple way to do this. I really hadn't been thinking
> of interactive input...how important is it to support arbitrary line
> ending in interactive input? How often would it be coming from an
> unknown source? Not that it wouldn't be nice for completeness' sake in
> any case.

Maybe it isn't all that important. We can assume that sys.stdin
conforms to the local convention, I guess. And for interactive input
coming in over sockets (think of things like a Python MUD server
connected to via telnet) we'll probably get a known convention.

But: I don't think it's all that difficult either, I think my flag
proposal shoul handle all cases fairly easily. Or do you see problems
with it?

I'm skipping the read()/readline() stuff, the more I think about it
the more I think that the readtoterminator() solution is the right
one. And if we want backward compatibility to Python versions that
don't have the readtoterminator() file object method we can add a
workaround to the class. We'd still have only a single place in the
code where we would have to look at every byte.

Read and readline would become really simple:
   def readline(self, count=0):
       data = self.fp.readtoterminator('\r\n', count)
       if not data: return data
       if self.skipinitialnewline and data[0] == '\n':
	  data = data[1:]
       self.skipinitialnewline = (data[-1] == '\r')
       if data[-1] == '\r':
	  data = data[:-1] + '\n'
       return data

   def read(self, count=0):
       data = ''
       while 1:
              next = self.readline(count)
	      if not next: return data
	      data = data + next
	      if count:
		 count = count - len(next)
		 if count <= 0: return data

> In any case, my goal is that something like this would become part of
> the built-in file object.
> [...]
> One reason it needs to be built in is that it could then be used for
> imports and execs() and all that, which is really where this all
> started.

This is a whole different can of worms. And I think that putting the
crossplatform newline functionality in the file objects isn't going to
get us closer to a solution:-(

The lowlevel import code uses stdio FILE * parameters all over the
place, so unless newline conversion is implemented at the stdio level
we will have to replace te whole import machinery by a
re-implementation in Python. This is doable, all the hooks are there
and there's prior art too if I'm not mistaken (IIRC someone did
imports from zip files or so).

> Jack, you had mentioned that you had some version of cross platform
> importing working with MacPython. What do you have now?

A very simple and efficient hack, for input only. The MSL stdio, that
MacPython uses, always calls a lowlevel internal routine to do \r->\n
mapping. MacPython now has a modified version of that routine that
will pass both \r and \n as \n. No support for \r\n, though.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm