[Python-Dev] Better SyntaxError messages

Tue, 4 Jul 2000 05:05:38 -0700 (PDT)

On Mon, 3 Jul 2000, Ka-Ping Yee wrote:
> It doesn't even have to be its own class of error, i suppose,
> as long as it gets indicated some way ("SyntaxError: invalid
> indentation" would be fine).

It turns out that this should be quite easy.  If it weren't
past 4am i would be posting a patch instead of just a verbal
suggestion right now -- but here's how to do it.

For this kind of thing:

>     >>> if 1:
>     ...   3
>     ...  4
>     inconsistent dedent
>       File "<stdin>", line 3
>         4
>         ^
>     SyntaxError: invalid token

...clearly it's trivial, as the case is already marked in the
code (tokenizer.c).  Instead of dumping the "inconsistent dedent"
message to stderr, return E_INDENT.

For this situation:

>     >>>  3 
>       File "<stdin>", line 1
>         3
>         ^
>     SyntaxError: invalid syntax

...we have an INDENT where none is expected.  This is also
easy.  At the end of PyParser_AddToken, we simply check to
see if the token that caused the problem was indent-related:

    if (type == INDENT || type == DEDENT) return E_INDENT;

Finally, the most interesting case:

>     >>> if 1:
>     ... 3
>       File "<stdin>", line 2
>         3
>         ^
>     SyntaxError: invalid syntax

...we expected an INDENT and didn't get one.  This is a
matter of checking the accelerator table to see what we
were expecting.  Also not really that hard:

    int expected; /* at the top of PyParser_AddToken */

    ...

    if (s->s_lower == s->s_upper - 1) /* only one possibility */
    {
        expected = ps->p_grammar->g_ll.ll_label[s->s_lower].lb_type;
        if (expected == INDENT || expected == DEDENT) return E_INDENT;
    }

I like this last case best, as it means we can produce
more useful messages for a variety of syntax errors!  When
there is a single particular kind of token expected, now
Python can tell you what it is.  After inserting this:

    /* Stuck, report syntax error */
    fprintf(stderr, "Syntax error: unexpected %s",
        _PyParser_TokenNames[type]);
    if (s->s_lower == s->s_upper - 1) {
        fprintf(stderr, " (wanted %s)",
                _PyParser_TokenNames[labels[s->s_lower].lb_type]);
    }
    fprintf(stderr, "\n");

... i played around a bit:

    >>> (3,4]
    Syntax error: unexpected RSQB (wanted RPAR)
      File "<stdin>", line 1
        (3,4]
            ^
    SyntaxError: invalid syntax
    >>> 3..
    Syntax error: unexpected NEWLINE (wanted NAME)
      File "<stdin>", line 1
        3..
          ^
    SyntaxError: invalid syntax
    >>> 3.)
    Syntax error: unexpected RPAR
      File "<stdin>", line 1
        3.)
          ^
    SyntaxError: invalid syntax
    >>> a^^
    Syntax error: unexpected CIRCUMFLEX
      File "<stdin>", line 1
        a^^
          ^
    SyntaxError: invalid syntax
    >>> if 3:
    ... 3
    Syntax error: unexpected NUMBER (wanted INDENT)
      File "<stdin>", line 2
        3
        ^
    SyntaxError: invalid syntax
    >>> 4,,
    Syntax error: unexpected COMMA
      File "<stdin>", line 1
        4,,
          ^
    SyntaxError: invalid syntax
    >>> [3,)
    Syntax error: unexpected RPAR (wanted RSQB)
      File "<stdin>", line 1
        [3,)
           ^
    SyntaxError: invalid syntax
    >>> if a == 3 and
    Syntax error: unexpected NEWLINE
      File "<stdin>", line 1
        if a == 3 and
                    ^
    SyntaxError: invalid syntax
    >>> if a = 3: 
    Syntax error: unexpected EQUAL (wanted COLON)
      File "<stdin>", line 1
        if a = 3:
             ^
    SyntaxError: invalid syntax

This isn't going to cover all cases, but i thought it was pretty cool.

So, in summary:

    - Producing E_INDENT errors is easy, and should require
      just three small changes (one in tokenizer.c and two
      in parser.c, specifically PyParser_AddToken)

    - We can get some info we need to produce better syntax
      error messages in general, but this requires a little
      more thought about how to pass the info back out of
      the parser to pythonrun.c (err_input).

-- ?!ng

"This code is better ihan any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu