<div dir="ltr"><br><div class="gmail_quote"><div dir="ltr">On Tue, Jul 19, 2016 at 7:21 AM Steven D'Aprano <<a href="mailto:steve@pearwood.info">steve@pearwood.info</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:<br>

<br>

> There was this question on the python list a few days ago:<br>

> Subject: SyntaxError: Non-ASCII character<br>

[...]<br>

> I pointed out that the python2 error was more helpful (to my eyes) than<br>

> python3s<br>

<br>

And I pointed out how I thought the Python 3 error message could be<br>

improved, but the Python 2 error message was not very good.<br>

<br>

<br>

> Python3<br>

><br>

> Traceback (most recent call last):<br>

>   File "<stdin>", line 1, in <module><br>

>   File "/home/ariston/foo.py", line 31<br>

>     wf = wave.open(“test.wav”, “rb”)<br>

>                        ^<br>

> SyntaxError: invalid character in identifier<br>

<br>

It would be much more helpful if the caret lined up with the offending<br>

character. Better still, if the offending character was actually stated:<br>

<br>

    wf = wave.open(“test.wav”, “rb”)<br>

                   ^<br>

SyntaxError: invalid character '“' in identifier<br>

<br>

<br>

> Python2<br>

><br>

> Traceback (most recent call last):<br>

>   File "<stdin>", line 1, in <module><br>

>   File "foo.py", line 31<br>

> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no<br>

> encoding declared; see <a href="http://python.org/dev/peps/pep-0263/" rel="noreferrer" target="_blank">http://python.org/dev/peps/pep-0263/</a> for details<br>

<br>

As I pointed out earlier, this is less helpful. The line itself is not<br>

shown (although the line number is given), nor is the offending<br>

character. (Python 2 can't show the character because it doesn't know<br>

what it is -- it only knows the byte value, not the encoding.) But in<br>

the person's text editor, chances are they will see what looks to them<br>

like a perfectly reasonable character, and have no idea which is the<br>

byte \xe2.<br>

<br>

<br>

<br>

> IOW<br>

> 1. The lexer is internally (evidently from the error message) so<br>

> ASCII-oriented that any “unicode-junk” just defaults out to identifiers<br>

> (presumably comments are dealt with earlier) and then if that lexing action<br>

> fails it mistakenly pinpoints a wrong *identifier* rather than just an<br>

> impermissible character like python 2<br>

<br>

You seem to be jumping to a rather large conclusion here. Even if you<br>

are right that the lexer considers all otherwise-unexpected characters<br>

to be part of an identifier, why is that a problem?<br></blockquote><div><br></div><div>It's a problem because those characters could never be part of an identifier.  So it seems like a bug.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I agree that it is mildly misleading to say<br>

<br>

invalid character '“' in identifier<br>

<br>

when “ is not part of an identifier:<br>

<br>

py> '“test'.isidentifier()<br>

False<br>

<br>

but I don't think you can jump from that to your conclusion that<br>

Python's unicode support is somewhat "wrongheaded". Surely a much<br>

simpler, less inflammatory response would be to say that this one<br>

specific error message could be improved?<br>

<br>

But... is it REALLY so bad? What if we wrote it like this instead:<br>

<br>

py> result = my§function(arg)<br>

  File "<stdin>", line 1<br>

    result = my§function(arg)<br>

                        ^<br>

SyntaxError: invalid character in identifier<br>

<br>

Isn't it more reasonable to consider that "my§function" looks like it is<br>

intended as an identifier, but it happens to have an illegal character<br>

in it?<br>

<br>

> combine that with<br>

> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII<br>

<br>

How does @ emulate Perl?<br>

<br>

As for your second part, about not going outside of ASCII, yes, that is<br>

official policy for Python operators, keywords and builtins.<br>

<br>

<br>

> makes it seem  (to me) python's unicode support is somewhat wrongheaded.<br>

<br>

<br>

--<br>

Steve<br>

_______________________________________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofconduct/</a><br>

<br>

--<br>

<br>

---<br>

You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.<br>

To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/python-ideas/-gsjDSht8VU/unsubscribe" rel="noreferrer" target="_blank">https://groups.google.com/d/topic/python-ideas/-gsjDSht8VU/unsubscribe</a>.<br>

To unsubscribe from this group and all its topics, send an email to <a href="mailto:python-ideas%2Bunsubscribe@googlegroups.com" target="_blank">python-ideas+unsubscribe@googlegroups.com</a>.<br>

For more options, visit <a href="https://groups.google.com/d/optout" rel="noreferrer" target="_blank">https://groups.google.com/d/optout</a>.<br>

</blockquote></div></div>