[Python-bugs-list] [ python-Bugs-501622 ] Missing trailing newline should not raise SyntaxError

noreply@sourceforge.net noreply@sourceforge.net
Thu, 14 Nov 2002 08:42:32 -0800


Bugs item #501622, was opened at 2002-01-09 23:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=501622&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Bolen (db3l)
Assigned to: Guido van Rossum (gvanrossum)
>Summary: Missing trailing newline should not raise SyntaxError

Initial Comment:
If you have a module that you wish to compile using 
the builtin compile() function (in 'exec' mode), it 
will fail with a SyntaxError if that module does not 
have a newline as its final token.

The same module can be executed directly by the 
interpreter, or imported by another module, and Python 
can properly compile and save a pyc for the module.

I believe the difference is rooted in the fact that 
the tokenizer (tokenizer.c, in tok_nextc()) 
will "fake" a newline at the end of a file if it 
doesn't find one, but it will not do so when 
tokenizing a string buffer.

What I'm not certain of is whether faking such a token 
for strings as well won't break something else (such 
as when parsing a string for an expression rather than 
a full module).  But without such a change, you have a 
state where a module that works (and compiles) in 
other circumstances cannot be read into memory and 
compiled with the compile() builtin.

This came up while tracking down a problem with 
failures using Gordan McMillan's Installer package 
which compiles modules using compile() before 
including them in the archive.

I believe this is true for all releases since at least 
1.5.2.

-- David

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-22 19:35

Message:
Logged In: YES 
user_id=6380

Hm, adding it to builtin_compile isn't enough. We'd have to
add it to exec as well.

I think the lexer and/or parser should take care of this --
just as it should take care of accepting \r as well as \n as
well as \r\n.

Yes, it's hard to find. But there's got to be a way.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 18:14

Message:
Logged In: YES 
user_id=35752

I'm +1 on builtin_compile adding the newline.  It's the
lazy way out and it's better than every person hacking
with the parser stumbling into it and coming up with their
own work around.

Guido?

----------------------------------------------------------------------

Comment By: David Bolen (db3l)
Date: 2002-03-22 18:06

Message:
Logged In: YES 
user_id=53196

If compile() is being used in exec mode with a non-
terminated multi-line string, it's not going to work unless 
the application generates that copy itself in any event.  
So without an interpreter fix, I'd think the string copy is 
inevitable, and it might simplify things to have the 
builtin function take care of it.  It's something easy to 
overlook at the application level and could thus be fixed 
in one place rather than at each point of use.

On the other hand, I also noticed something I overlooked 
when first encountering the problem - the 2.2 docs added 
some text to compile() talking about this need for 
termination.  So it could be argued that it's now a 
documented restriction, and should the newline append (with 
any requisite string duplication) be needed, it leaves it 
to the individual applications rather than forcing it in 
the builtin.

Not to mention a documentation solution could thus be 
declared already done.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 18:01

Message:
Logged In: YES 
user_id=31435

Well, the user can't append an '\n' inplace either.  The 
question is whether we do that for them, or let it blow 
up.  OTOH, codeop.py has a lot of fun <wink> now trying to 
compile as-is, tben with one '\n' tacked on, then with two 
of 'em.  It would take me a long time to figure out exactly 
why it's doing all that, and to guess exactly how it would 
break.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-22 17:46

Message:
Logged In: YES 
user_id=6380

Probably, unless the start symbol is "expr" (which doesn't
need a newline).

But it would mean copying a potentially huge string -- we
can't append the \n in place.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 17:41

Message:
Logged In: YES 
user_id=31435

Would it make sense for builtin_compile() to append a 
newline itself (say, if one weren't already present)?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-22 17:20

Message:
Logged In: YES 
user_id=6380

> the tok_nextc code is hairy and whatever
> I tried broke something else.

That's exactly what happened to me when I tried to fix this
myself long ago. :-(

The workaround is simple enough: whoever calls compile()
should append a newline to the string just to be sure.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 17:07

Message:
Logged In: YES 
user_id=35752

I ran into this bug myself when writing the PTL compiler.
Here's a test case:

code = "def foo():\n  pass"
open("bug.py", "w").write(code)
import bug # works
compile(code, "<string>", "exec") # doesn't work

I traced this bug to tok_nextc.  If the input is coming from
a file and the last bit of input doesn't end with a newline
then one is faked.  This doesn't happen if the input is
coming from a string.  I spent time trying to figure out
how to fix it but the tok_nextc code is hairy and whatever
I tried broke something else.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 17:07

Message:
Logged In: YES 
user_id=35752

I ran into this bug myself when writing the PTL compiler.
Here's a test case:

code = "def foo():\n  pass"
open("bug.py", "w").write(code)
import bug # works
compile(code, "<string>", "exec") # doesn't work

I traced this bug to tok_nextc.  If the input is coming from
a file and the last bit of input doesn't end with a newline
then one is faked.  This doesn't happen if the input is
coming from a string.  I spent time trying to figure out
how to fix it but the tok_nextc code is hairy and whatever
I tried broke something else.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=501622&group_id=5470