On the subject of replacing the current parser, I am actively working on that. See GitHub.com/gvanrossum/pegen.

On Tue, Jan 14, 2020 at 10:32 Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Jan 14, 2020, at 05:22, Σταύρος Ντέντος <stdedos@gmail.com> wrote:
>
> Hello there,
>
> If I have simply missed a double colon starting a for loop
>
>  File "./bbq.py", line 160
>    for config_file in config_files
>                                  ^
> SyntaxError: invalid syntax
>
> the message is not as straightforward.

I think almost everyone would prefer it if the compiler could say “SyntaxError: missing colon at end of a compound statement header” or something more useful.

And that probably goes even more for this case:

    spam = eggs(cheese, (foo, bar)
    cheese = spam*2

The problem is to come up with a rule that could be applied to detect these cases given the information the simple LR(1) parser has available at the time of failure. I suspect there’s no way to do that without radically changing the parser architecture, keeping track of a lot more state, or partially re-parsing things in the error handler. (If it were easy, Guido would have done it back in 1.x.)

But maybe there’s a way to heuristically detect that these problems are _likely_ causes of the error (without having to be as ridiculously complicated as what Clang does with C++ code)? If you could find a way to make the error say “SyntaxError: invalid syntax (possibly missing colon at end of compound statement header)” in most simple “forgot the colon” cases and very few other cases, without massively disrupting everything, I think people would be happy with that.

You might even be able to take advantage of re-parsing without having to solve all the problems that go with that. For example, technically, you can’t even access the last logical line to reparse; practically, you can get it in the same cases the traceback can print it, and those are probably the only cases you need to heuristically improve the error handling. You could even maybe do a quick & dirty proof of concept in Python in an import hook, if you don’t want to dive into the middle of the C compiler code.

As an alternative, there are lots of projects to use more powerful parser algorithms on Python. There’s not much call to replace CPython’s parser, because there aren’t any benefits to offset the costs. (At least assuming that the language is going to stay LR(1), to make it easy to parse in your head.) But if you could improve most of the most annoying error handling cases, that might be a different story. And these might also be easier to play with. (Some have pure Python implementations, and even the ones in C aren’t embedded in the middle of the compiler code.) IIRC, early Java did something clever with a GLR parser that has LR(1) performance on all valid code and strictly bounded complexity on error recovery (so it may get as bad as worst-case cubic, but cubic on N<=5 so who cares) so they could usually produce error messages as good as most C compilers without the horrible mess of parsing that most C compilers need.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ILJNAN4E5VROSODWO2UWJDHP5DCVM56G/
Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido (mobile)