On 9 July 2015 at 08:03, Terry Reedy <tjreedy@udel.edu> wrote:
On 7/8/2015 1:53 AM, Nick Coghlan wrote:
One of the more opaque error messages new Python users can encounter is a syntax error due to unmatched parentheses:
File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: invalid syntax
I'm not sure it would be feasible though - we generate syntax errors from a range of locations where we don't have access to the original token data any more :(
Could that be changed?
I think we're already down to only having four places where they can be thrown (tokeniser, parser, symbol table analysis, byte code generator), so reducing it further seems unlikely.
I have occasionally thought about developing a table for Python (and rewriting in Python), but indents and dedents are not trivial. (Even tokenizer.py does not handle \t indents correctly.) Maybe I should think a bit harder. Idle has an option to syntax-check a module without running it. If compile messages are not improved, it would certainly be sensible to run a separate fence-checker at least when check-only is requested, for better error messages. These could potentially include 'missing :' when a header 'opened' by for/while/if/elif/else/class/def/with is not closed by ':'.
That sounds like a plausible direction, as it turned out the particular case that prompted this thread wasn't due to missing parentheses at all, it was a block of code like: try: .... statement dedented early except ...: ... I think Stephen Turnbull may also be on to something: we don't necessarily need to tell the user what fenced token was unmatched from earlier, it may be enough to tell them what *would* have been acceptable as the next token where the caret is pointing so they have something more specific to consider than "invalid syntax". For example, in the case I was attempting to help debug remotely, the error message might have been: File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: expected "except" or "finally" Other fence errors would then be: SyntaxError: expected ":" SyntaxError: expected ")" SyntaxError: expected "]" SyntaxError: expected "}" SyntaxError: expected "import" # from ... import ... SyntaxError: expected "else" # ... if ... else ... SyntaxError: expected "in" # for ... in ... And once 'async' is a proper keyword: SyntaxError: expected "def", "with" or "for" # async ... The currently problematic cases are those in https://docs.python.org/3/reference/grammar.html where seeing "foo" at one point in the token stream sets up the expectation in the parser that "bar" must appear a bit further along. At the moment, the parser bails out saying "I wasn't expecting this!", and doesn't answer the obvious follow on question "Well, what *were* you expecting?". Strings would also qualify for a similar kind of treatment, as the current error message doesn't tell us whether the parser was looking for closing single or double quotes: $ python3 -c "'" File "<string>", line 1 ' ^ SyntaxError: EOL while scanning string literal $ python3 -c "'''" File "<string>", line 1 ''' ^ SyntaxError: EOF while scanning triple-quoted string literal $ python3 -c '"' File "<string>", line 1 " ^ SyntaxError: EOL while scanning string literal $ python3 -c '"""' File "<string>", line 1 """ ^ SyntaxError: EOF while scanning triple-quoted string literal This discussion has headed into a part of the compiler chain that I don't actually know myself, though - the only thing I've ever had to do with the parser is modifying the grammar file and adding the brute force error message override when someone leaves out the parentheses on print() and exec() calls. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia