[Python-bugs-list] [Bug #132850] unix line terminator on windows

noreply@sourceforge.net noreply@sourceforge.net
Sat, 17 Feb 2001 16:17:49 -0800


Bug #132850, was updated on 2001-Feb-17 10:45
Here is a current snapshot of the bug.

Project: Python
Category: Python Interpreter Core
Status: Open
Resolution: None
Bug Group: Platform-specific
Priority: 6
Submitted by: mpmak
Assigned to : tim_one
Summary: unix line terminator on windows

Details: 
Syntax/Name error when first script line is terminated only by \x0a - not
\x0d\x0a

this does totally nothing - every line terminated with \x0a
#
print '1 line'
print '2 line'

NameError error - name p is not defined
print '1 line'
print '2 line'

when only script has single line:
print '1 line'

SyntaxError but traceback is funny:
pprint '1 line'
              ^
SyntaxError: invalid syntax


Follow-Ups:

Date: 2001-Feb-17 16:17
By: tim_one

Comment:
Thought about that, but it won't fly:  the C std (whether C89 or C99)
guarantees only one character of pushback via ungetc; the typical case here
would have 2, and whether or not that works is again a platform-dependent
accident.
-------------------------------------------------------

Date: 2001-Feb-17 14:39
By: mpmak

Comment:

Inside NT works.
Thanks.

PS same effect without fseek/ftell:

int ispyc = 0;
int bytesread=fread(buf, 1, 2, fp);
ispyc = bytesread == 2 &&
  ((unsigned int)buf[1]<<8 | buf[0]) == halfmagic;
while( !ispyc && bytesread>0 ){
 ungetc(buf[--bytesread], fp);
}

-------------------------------------------------------

Date: 2001-Feb-17 14:09
By: tim_one

Comment:
Please try pythonrun.c rev 2.123.

Since fseek is a platform-dependent accident in text mode, I don't want to
use that at all anymore.

zessin_5, how does rev 2.123 work for you under OpenVMS?  I won't close
this bug for a few days pending your answer.
-------------------------------------------------------

Date: 2001-Feb-17 13:45
By: mpmak

Comment:

I have tested it on Windows NT 4.0,MSVC 6.0 SP4, with python from cvs
and:

-x - works as expected inside cmd files
*.py - are compiled/executed properly
*.pyc - python can execute pyc files too

linen umbers in buggy cmd/py files are shown corectly, in any case this
script fails when executing line 3:

#@python -x "%~f0" %* & goto :EOF
print 'ok'
makeanerror

-------------------------------------------------------

Date: 2001-Feb-17 13:31
By: tim_one

Comment:
zessin_5, then you must have rebroken -x under OpenVMS.  Yes?

-------------------------------------------------------

Date: 2001-Feb-17 13:27
By: tim_one

Comment:
Sorry, can't figure out what you think that code accomplishes.  Did you try
an example using -x?  Windows is the primary reason -x exists, so no hack
that leaves -x broken on Windows is acceptable.  The getc/ungetc business
is needed so that in *case* -x was specified (and without a change to the C
API, we have no way to know whether it was at this point), the \n that -x
ungetc'ed gets pushed back.  Else line numbers in tracebacks are off by one
under -x, and that's not acceptable either.
-------------------------------------------------------

Date: 2001-Feb-17 13:11
By: mpmak

Comment:

Brute force hack helps - at last for MS:

#ifdef _MSC_VER
		const long currentpos = ftell(fp);
		int ispyc = fread(buf, 1, 2, fp) == 2 &&
		        ((unsigned int)buf[1]<<8 | buf[0]) == halfmagic;
		fseek(fp, currentpos, SEEK_SET);
#else
... current CVS code goes here
#endif

-------------------------------------------------------

Date: 2001-Feb-17 12:56
By: zessin_5

Comment:
Additional info:

A change (in pythonrun.c) also broke (partly)
processing on OpenVMS.

The same effect (doubling the first character)
happens when the file is not in stream linefeed
format - otherwise it works as before.
Text editors, however, create files in record
format.

The problem is that the C library on OpenVMS
can only seek on record boundaries - I don't
want to go into too much detail, here.

I've restored the old code from version 2.1a2
in routine 'maybe_pyc_file()' in my copy and
it started working again.
-------------------------------------------------------

Date: 2001-Feb-17 12:47
By: tim_one

Comment:
Indeed, from stepping thru MS's ftell(), they do a great deal of expensive
fiddling for text-mode streams *assuming* that every \n in the stdio buffer
was originally an \r\n on disk.  When that isn't true, the adjustments they
make yield bizarre results.  We can't do anything about that.
-------------------------------------------------------

Date: 2001-Feb-17 12:26
By: tim_one

Comment:
I'm not sure this can be fixed with reasonable effort.

The patch that allowed .pyc files to get executed from the command-line,
and whether or not they have a .pyc/.pyo extension, broke the -x option
(skip first line), by rewinding the file to see whether it begins with the
right magic number.  That undid what -x does (i.e., skip over the first
line).

So I slopped in another hack to restore the file position in case (as is in
fact almost always the case) the .pyc magic-number hack didn't find what it
was looking for.

And there's the rub.  It *turns out* that, using MS's libraries, and
assuming FILE* fp is at the start of the text-mode stream, after

int ch = getc(fp);
long pos = ftell(fp);

then ch is reliably set to the first character in the file.  However, pos
is set to 1 if and only if the first line *ends* with \r\n.  If it ends
with plain \n, pos is left at 0.  This is bizarre and darned hard to
explain, but that is the way it works.

The .pyc hackery later fseek's back to pos and does ungetc(ch).  Since in
your case pos was set to 0, that ends up "stuttering" the first character
(ungetc('p') effectively puts an extra 'p' at the start of the file,
because pos was left at 0).

Since files with Unix line-ends are not proper text-mode files under
Windows, I doubt Microsoft would consider their behavior buggy here, and
neither would the C std (C guarantees very little about how text mode
works).

And I don't see any hope of fixing it short of either:

1. Opening .py files in binary mode and doing line-end translations
ourselves (a good idea, actually, but A Project).

or

2. Redoing the .pyc hack from scratch:  it should be done *before* -x
processing.  But that may require changing Python's C API.

In short, it's a mess.
-------------------------------------------------------

Date: 2001-Feb-17 11:18
By: tim_one

Comment:
Bizarre!  Assigned to me.  Worked OK in 2.0, and no idea how it could have
got broken.  Well, in truth Python never did anything to make this work, it
was simply that the MS stdio library delivered \n regardless of whether \n
or \r\n terminated a line.  That is, it was *nice* that it worked, but in
truth it was an accident.  Later.
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132850&group_id=5470