[Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line conversion?

Tim Peters tim.one@home.com
Wed, 11 Apr 2001 20:14:19 -0400


[Just van Rossum]
> Nope: \r's get translated to \n and for whatever reason \n's get
> translated to \r... So when opening a unix file on the Mac, it
> will look like it has \r line endings and when opening a Windows
> text file on the Mac, it will appear as if it has \n\r line endings...

Then it's probably a Good Thing Jack disabled this code, since it wouldn't
have done anything useful on a Mac anyway (for Python to ever see \r\n the
source file would have had to contain \n\r, which is nobody's text file
convention).

>> Etc:  submit a patch that makes the code above "work", and I'm
>> sure it would be accepted, but a non-Mac person can't guess
>> what's needed.

> That's probably easy enough -- although would require changing all
> tokenizer code that looks for \n to also look for \r, including
> PyOS_ReadLine(), so it goes well beyond the snippet you posted.

No, there's nothing wrong with the tokenizer code:  it's coded in C, and the
C text convention is that lines end with \n, period.  Reliance on that
convention is ubiquitous -- and properly so.  What we need instead are
platform-specific implementations of fgets() functionality, which deliver
lines containing \n where and only where the platform Python is supposed to
believe a line ends.  Then nothing else in the parser needs to be touched
(and, indeed, the current \r\n mini-hack could be thrown away).

> And then there's the Python file object...

Different issue.  If this ever gets that far, note that the crunch to speed
up line-at-a-time file input ended up *requiring* use of the native fgets()
on Windows, as that was the only way on that platform to avoid having the OS
do layers of expensive multithreading locks for each character read.  So
there's no efficient way in general to get Windows to recognize \r line
endings short of implementing our own stdio from the ground up.  On other
platforms, fileobject.c's get_line() reads one character at a time, and I
expect its test for "is this an EOL char?" could be liberalized at reasonable
cost.

OTOH, how does the new-fangled Mac OS fit into all this?  Perhaps, for
compatibility, their C libraries already recognize both Unix and Mac Classic
line conventions, and deliver plain \n endings for both?  Or did they blow
that part too <wink>?