[Python-bugs-list] [Bug #132850] unix line terminator on windows

noreply@sourceforge.net noreply@sourceforge.net
Sat, 17 Feb 2001 13:11:27 -0800


Bug #132850, was updated on 2001-Feb-17 10:45
Here is a current snapshot of the bug.

Project: Python
Category: Python Interpreter Core
Status: Open
Resolution: None
Bug Group: Platform-specific
Priority: 6
Submitted by: mpmak
Assigned to : tim_one
Summary: unix line terminator on windows

Details: 
Syntax/Name error when first script line is terminated only by \x0a - not
\x0d\x0a

this does totally nothing - every line terminated with \x0a
#
print '1 line'
print '2 line'

NameError error - name p is not defined
print '1 line'
print '2 line'

when only script has single line:
print '1 line'

SyntaxError but traceback is funny:
pprint '1 line'
              ^
SyntaxError: invalid syntax


Follow-Ups:

Date: 2001-Feb-17 13:11
By: mpmak

Comment:

Brute force hack helps - at last for MS:

#ifdef _MSC_VER
		const long currentpos = ftell(fp);
		int ispyc = fread(buf, 1, 2, fp) == 2 &&
		        ((unsigned int)buf[1]<<8 | buf[0]) == halfmagic;
		fseek(fp, currentpos, SEEK_SET);
#else
... current CVS code goes here
#endif

-------------------------------------------------------

Date: 2001-Feb-17 12:56
By: zessin_5

Comment:
Additional info:

A change (in pythonrun.c) also broke (partly)
processing on OpenVMS.

The same effect (doubling the first character)
happens when the file is not in stream linefeed
format - otherwise it works as before.
Text editors, however, create files in record
format.

The problem is that the C library on OpenVMS
can only seek on record boundaries - I don't
want to go into too much detail, here.

I've restored the old code from version 2.1a2
in routine 'maybe_pyc_file()' in my copy and
it started working again.
-------------------------------------------------------

Date: 2001-Feb-17 12:47
By: tim_one

Comment:
Indeed, from stepping thru MS's ftell(), they do a great deal of expensive
fiddling for text-mode streams *assuming* that every \n in the stdio buffer
was originally an \r\n on disk.  When that isn't true, the adjustments they
make yield bizarre results.  We can't do anything about that.
-------------------------------------------------------

Date: 2001-Feb-17 12:26
By: tim_one

Comment:
I'm not sure this can be fixed with reasonable effort.

The patch that allowed .pyc files to get executed from the command-line,
and whether or not they have a .pyc/.pyo extension, broke the -x option
(skip first line), by rewinding the file to see whether it begins with the
right magic number.  That undid what -x does (i.e., skip over the first
line).

So I slopped in another hack to restore the file position in case (as is in
fact almost always the case) the .pyc magic-number hack didn't find what it
was looking for.

And there's the rub.  It *turns out* that, using MS's libraries, and
assuming FILE* fp is at the start of the text-mode stream, after

int ch = getc(fp);
long pos = ftell(fp);

then ch is reliably set to the first character in the file.  However, pos
is set to 1 if and only if the first line *ends* with \r\n.  If it ends
with plain \n, pos is left at 0.  This is bizarre and darned hard to
explain, but that is the way it works.

The .pyc hackery later fseek's back to pos and does ungetc(ch).  Since in
your case pos was set to 0, that ends up "stuttering" the first character
(ungetc('p') effectively puts an extra 'p' at the start of the file,
because pos was left at 0).

Since files with Unix line-ends are not proper text-mode files under
Windows, I doubt Microsoft would consider their behavior buggy here, and
neither would the C std (C guarantees very little about how text mode
works).

And I don't see any hope of fixing it short of either:

1. Opening .py files in binary mode and doing line-end translations
ourselves (a good idea, actually, but A Project).

or

2. Redoing the .pyc hack from scratch:  it should be done *before* -x
processing.  But that may require changing Python's C API.

In short, it's a mess.
-------------------------------------------------------

Date: 2001-Feb-17 11:18
By: tim_one

Comment:
Bizarre!  Assigned to me.  Worked OK in 2.0, and no idea how it could have
got broken.  Well, in truth Python never did anything to make this work, it
was simply that the MS stdio library delivered \n regardless of whether \n
or \r\n terminated a line.  That is, it was *nice* that it worked, but in
truth it was an accident.  Later.
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132850&group_id=5470