On Tuesday, July 19, 2016 at 7:41:38 PM UTC+5:30, Neil Girdhar wrote:

On Tue, Jul 19, 2016 at 8:18 AM Rustom Mody wrote:

On Tuesday, July 19, 2016 at 5:06:17 PM UTC+5:30, Neil Girdhar wrote:

On Tue, Jul 19, 2016 at 7:21 AM Steven D'Aprano wrote:
On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:

> IOW
> 1. The lexer is internally (evidently from the error message) so
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
> (presumably comments are dealt with earlier) and then if that lexing action
> fails it mistakenly pinpoints a wrong *identifier* rather than just an
> impermissible character like python 2

You seem to be jumping to a rather large conclusion here. Even if you
are right that the lexer considers all otherwise-unexpected characters
to be part of an identifier, why is that a problem?

It's a problem because those characters could never be part of an identifier. So it seems like a bug.

An armchair-design solution would say: We should give the most appropriate answer for every possible unicode character category
This would need to take all the Unicode character-categories and Python lexical-categories and 'cross-product' them — a humongous task to little advantage

I don't see why this is a "humongous task". Anyway, your solution boils down to the simplest fix in the lexer which is to block some characters from matching any category, does it not?

Block? Not sure what you mean… Nothing should change (in the simplest solution at least) apart from better error messages
My suggested solution involved this:
Currently the lexer — basically an automaton — reveals which state its in when it throws error involving "identifier"
Suggested change:

if in_ident_state:
if current_char is allowable as ident_char:
     continue as before
elif current_char is ASCII:
     Usual error
else:
     throw error eliding the "in_ident state"
else:
as is...

BTW after last post I tried some things and found other unsatisfactory (to me) behavior in this area; to wit:

>>> x = 0o19
File "<stdin>", line 1
    x = 0o19
           ^
SyntaxError: invalid syntax

Of course the 9 cannot come in an octal constant but "Syntax Error"??
Seems a little over general

My preferred fix:
make a LexicalError sub exception to SyntaxError

Rest should follow for both

Disclaimer: I am a teacher and having a LexicalError category makes it nice to explain some core concepts
However I understand there are obviously other more pressing priorities than to make python superlative as a CS-teaching language :-)