[Python-Dev] Better SyntaxError messages
Ka-Ping Yee
ping@lfw.org
Tue, 4 Jul 2000 05:05:38 -0700 (PDT)
On Mon, 3 Jul 2000, Ka-Ping Yee wrote:
> It doesn't even have to be its own class of error, i suppose,
> as long as it gets indicated some way ("SyntaxError: invalid
> indentation" would be fine).
It turns out that this should be quite easy. If it weren't
past 4am i would be posting a patch instead of just a verbal
suggestion right now -- but here's how to do it.
For this kind of thing:
> >>> if 1:
> ... 3
> ... 4
> inconsistent dedent
> File "<stdin>", line 3
> 4
> ^
> SyntaxError: invalid token
...clearly it's trivial, as the case is already marked in the
code (tokenizer.c). Instead of dumping the "inconsistent dedent"
message to stderr, return E_INDENT.
For this situation:
> >>> 3
> File "<stdin>", line 1
> 3
> ^
> SyntaxError: invalid syntax
...we have an INDENT where none is expected. This is also
easy. At the end of PyParser_AddToken, we simply check to
see if the token that caused the problem was indent-related:
if (type == INDENT || type == DEDENT) return E_INDENT;
Finally, the most interesting case:
> >>> if 1:
> ... 3
> File "<stdin>", line 2
> 3
> ^
> SyntaxError: invalid syntax
...we expected an INDENT and didn't get one. This is a
matter of checking the accelerator table to see what we
were expecting. Also not really that hard:
int expected; /* at the top of PyParser_AddToken */
...
if (s->s_lower == s->s_upper - 1) /* only one possibility */
{
expected = ps->p_grammar->g_ll.ll_label[s->s_lower].lb_type;
if (expected == INDENT || expected == DEDENT) return E_INDENT;
}
I like this last case best, as it means we can produce
more useful messages for a variety of syntax errors! When
there is a single particular kind of token expected, now
Python can tell you what it is. After inserting this:
/* Stuck, report syntax error */
fprintf(stderr, "Syntax error: unexpected %s",
_PyParser_TokenNames[type]);
if (s->s_lower == s->s_upper - 1) {
fprintf(stderr, " (wanted %s)",
_PyParser_TokenNames[labels[s->s_lower].lb_type]);
}
fprintf(stderr, "\n");
... i played around a bit:
>>> (3,4]
Syntax error: unexpected RSQB (wanted RPAR)
File "<stdin>", line 1
(3,4]
^
SyntaxError: invalid syntax
>>> 3..
Syntax error: unexpected NEWLINE (wanted NAME)
File "<stdin>", line 1
3..
^
SyntaxError: invalid syntax
>>> 3.)
Syntax error: unexpected RPAR
File "<stdin>", line 1
3.)
^
SyntaxError: invalid syntax
>>> a^^
Syntax error: unexpected CIRCUMFLEX
File "<stdin>", line 1
a^^
^
SyntaxError: invalid syntax
>>> if 3:
... 3
Syntax error: unexpected NUMBER (wanted INDENT)
File "<stdin>", line 2
3
^
SyntaxError: invalid syntax
>>> 4,,
Syntax error: unexpected COMMA
File "<stdin>", line 1
4,,
^
SyntaxError: invalid syntax
>>> [3,)
Syntax error: unexpected RPAR (wanted RSQB)
File "<stdin>", line 1
[3,)
^
SyntaxError: invalid syntax
>>> if a == 3 and
Syntax error: unexpected NEWLINE
File "<stdin>", line 1
if a == 3 and
^
SyntaxError: invalid syntax
>>> if a = 3:
Syntax error: unexpected EQUAL (wanted COLON)
File "<stdin>", line 1
if a = 3:
^
SyntaxError: invalid syntax
This isn't going to cover all cases, but i thought it was pretty cool.
So, in summary:
- Producing E_INDENT errors is easy, and should require
just three small changes (one in tokenizer.c and two
in parser.c, specifically PyParser_AddToken)
- We can get some info we need to produce better syntax
error messages in general, but this requires a little
more thought about how to pass the info back out of
the parser to pythonrun.c (err_input).
-- ?!ng
"This code is better ihan any code that doesn't work has any right to be."
-- Roger Gregory, on Xanadu