On Tue, Jul 19, 2016 at 07:40:42AM -0700, Rustom Mody wrote:
My suggested solution involved this: Currently the lexer — basically an automaton — reveals which state its in when it throws error involving "identifier" Suggested change:
if in_ident_state: if current_char is allowable as ident_char: continue as before elif current_char is ASCII: Usual error else: throw error eliding the "in_ident state" else: as is...
I'm sorry, you've lost me. Is this pseudo-code (1) of the current CPython lexer, (2) what you imagine the current CPython lexer does, or (3) what you think it should do? Because you call it a "change", but you're only showing one state, so it's not clear if its the beginning or ending state. Basically I guess what I'm saying is that if you are suggesting a concrete change to the lexer, you should be more precise about what needs to actually change.
BTW after last post I tried some things and found other unsatisfactory (to me) behavior in this area; to wit:
x = 0o19 File "<stdin>", line 1 x = 0o19 ^ SyntaxError: invalid syntax
Of course the 9 cannot come in an octal constant but "Syntax Error"?? Seems a little over general
My preferred fix: make a LexicalError sub exception to SyntaxError
What's the difference between a LexicalError and a SyntaxError? Under what circumstances is it important to distinguish between them? It would be nice to have a more descriptive error message, but why should I care whether the invalid syntax "0o19" is caught by a lexer or a parser or the byte-code generator or the peephole optimizer or something else? Really all I need to care about is: - it is invalid syntax; - why it is invalid syntax (9 is not a legal octal digit); - and preferably, that it is caught at compile-time rather than run-time. -- Steve