![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Jul 20, 2016 at 06:16:10PM -0300, Danilo J. S. Bellini wrote:
1. Using SyntaxError for lexical errors sounds as strange as saying a misspell/typo is a syntax mistake in a natural language.
Why? Regardless of whether the error is found by the tokeniser, the lexer, the parser, or something else, it is still a *syntax error*. Why would the programmer need to know, or care, what part of the compiler/interpreter detects the error? Also consider that not all Python interpreters will divide up the task of interpreting code exactly the same way. Tokenisers, lexers and parsers are very closely related and not necessarily distinct. Should the *exact same typo* generate TokenError in one Python, LexerError in another, and ParserError in a third? What is the advantage of that?
2. About those lexical error messages, the caret is worse than the lack of it when it's not aligned, but unless I'm missing something, one can't guarantee that the terminal is printing the error message with the right encoding. Including the row and column numbers in the message would be helpful.
It would be nice for the caret to point to the illegal character, but it's not *wrong* to point past it to the end of the token that contains the illegal character.
4. Unicode have more than one codepoint for some symbols that look alike, for example "危饾毢饾洿饾湲饾潹饾灑" are all valid uppercase sigmas. Ther
Not really. Look at their names: GREEK CAPITAL LETTER SIGMA MATHEMATICAL BOLD CAPITAL SIGMA MATHEMATICAL ITALIC CAPITAL SIGMA MATHEMATICAL BOLD ITALIC CAPITAL SIGMA MATHEMATICAL SANS-SERIF BOLD CAPITAL SIGMA MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL SIGMA Personally, I don't understand why the Unicode Consortium has included all these variants. But whatever the reason, the names hint strongly that they have specialised purposes, and shouldn't be used when you want the letter 危. But, if you do, Python will normalise them all to 危, so there's no real harm done, except to the readability of your code. [...]
when editing a code with an Unicode char like that, most people would probably copy and paste the symbol instead of typing it, leading to a consistent use of the same symbol.
You are assuming that the programmer's font includes glyphs for all of six of those code points. More likely, the programmer will see 危 for the first code point, and the other five will display as a pair of "missing glyph" boxes. (That's exactly what I see in my mail client, and in the Python interpreter.) Why a pair of boxes? Because they are code points in the Supplementary Multilingual Planes, and require *two* 16-bit code units in UTF-16. So naive Unicode software with poor support for the SMPs will display two boxes, one for each surrogate code point. Even if the code points display correctly, with distinct glyphs, your comment that most people will be forced to copy and paste the symbol is precisely why I am reluctant to see Python introduce non-ASCII keywords or operators. It's a pity, because I think that non-ASCII operators at least can make a much richer language (although I wouldn't want to see anything as extreme as APL). Perhaps I will change my mind in a few more years, as the popularity of emoji encourage more applications to have better support for non-ASCII and the SMPs. [...]
6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having Unicode keywords is merely contingent on Python 2 behavior that emphasized ASCII-only code (besides comments and strings).
No, it is a *policy decision*. It is not because Python 2 didn't support them. Python 2 didn't support non-ASCII identifiers either, but Python 3 intentionally broke with that.
7. The discussion isn't about lambda or anti-lambda bias, it's about keyword naming and readability. Who gains/loses with that resource? It won't hurt those who never uses lambda and never uses Unicode identifiers.
It will hurt those who have to read code with a mystery 位 that they don't know what it means and they have no idea how to search for it. At least "python lambda" is easy to search for. It will hurt those who want to use 位 as an identifier. I include myself in that category. I don't want 位 to be reserved as a keyword. I look at it like this: use 位 as a keyword makes as much sense as making f a keyword so that we can save a few characters by writing: f myfunction(arg, x, y): pass instead of def. I use f as an identifier in many places, e.g.: for f in list_of_functions: ... or in functional code: compose(f, g) Yes, I can *work around it* by naming things f_ instead of f, but that's ugly. Even though it saves a few keystrokes, I wouldn't want f to be reserved as a keyword, and the same goes for 位 as lambda.
8. I don't know if any consensus can emerge in this matter about lambdas, but there's another subject that can be discussed together: macros.
I'm pretty sure that Guido has ruled "Over My Dead Body" to anything resembling macros in Python. However, we can experiment with adding keywords and macro-like facilities without Guido's permission. For example: http://www.staringispolite.com/likepython/ It's a joke, of course, but the technology is real. Imagine, if you will, that somebody you could declare a "dialect" at the start of Python modules, just after the optional language cookie: # -*- coding: utf-8 -*- # -*- dialect math -*- which would tell importlib to run the code through some sort of source/AST transformation before importing it. That will allow us to localise the keywords, introduce new operators, and all the other things Guido hates *wink* and still be able to treat the code as normal Python. A bad idea? Probably an awful one. But it's worth experimenting with it, It will be fun, and it *just might* turn out to be a good idea. For the record, in the 1980s and 1990s, Apple used a similar idea for two of their scripting languages, Hypertalk and Applescript, allowing users to localise keywords. Hypertalk is now defunct, and Applescript has dropped that feature, which suggests that it is a bad idea. Or maybe it was just ahead of its time. -- Steve