[Python-ideas] allow `lambda' to be spelled λ

Danilo J. S. Bellini danilo.bellini at gmail.com
Wed Jul 20 17:16:10 EDT 2016


1. Using SyntaxError for lexical errors sounds as strange as saying a
misspell/typo is a syntax mistake in a natural language. A new
"LexicalError" or "TokenizerError" for that makes sense. Perhaps both this
new exception and SyntaxError should inherit from a new CompileError class.
But the SyntaxError is already covering cases alike with the TabError (an
IndentationError), which is a lexical analysis error, not a parser one [1].
To avoid such changes while keeping the name, at least the SyntaxError
docstring should be "Compile-time error." instead of "Invalid Syntax.", and
the documentation should be explicit that it isn't only about
parsing/syntax/grammar but also about lexical analysis errors.

2. About those lexical error messages, the caret is worse than the lack of
it when it's not aligned, but unless I'm missing something, one can't
guarantee that the terminal is printing the error message with the right
encoding. Including the row and column numbers in the message would be
helpful.

3. There are people who like and use unicode chars in identifiers. Usually
I don't like to translate comments/identifiers to another language, but I
did so myself, using variable names with accents in Portuguese for a talk
[2], mostly to give it a try. Surprisingly, few people noticed that until I
said. The same can be said about Sympy scripts, where symbols like Greek
letters would be meaningful (e.g. μ for the mean, σ for the standard
deviation and Σ for the covariance matrix), so I'd argue it's quite natural.

4. Unicode have more than one codepoint for some symbols that look alike,
for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑",
but this one is invalid in Python 3. The italic/bold/serif distinction
seems enough for a distinction, and when editing a code with an Unicode
char like that, most people would probably copy and paste the symbol
instead of typing it, leading to a consistent use of the same symbol.

5. New keywords, no matter whether they fit into the 7-bit ASCII or
requires Unicode, unavoidably breaks backwards compatibility at least to
some degree. That happened with the "nonlocal" keyword in Python 3, for
example.

6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having
Unicode keywords is merely contingent on Python 2 behavior that emphasized
ASCII-only code (besides comments and strings).

7. The discussion isn't about lambda or anti-lambda bias, it's about
keyword naming and readability. Who gains/loses with that resource? It
won't hurt those who never uses lambda and never uses Unicode identifiers.
Perhaps Sympy users would feel harmed by that, as well as other scientific
packages users, but looking for the "λ" char in GitHub I found no one using
it alone within Python code. The online Python books written in Greek that
I found were using only English identifiers.

8. I don't know if any consensus can emerge in this matter about lambdas,
but there's another subject that can be discussed together: macros. What OP
wants is exactly a "#define λ lambda", which would be only in the code that
uses/needs such symbol with that meaning. A minimal lexical macro that just
apply a single keyword token replacement by a identifier-like token is
enough for him. I don't know a nice way to do that, something like "from
__replace__ import lambda_to_λ" or even "def λ is lambda" would avoid new
keywords, but I also don't know how desired this resource is (perhaps to
translate the language keywords to another language?).

7. I really don't like the editor "magic", it would be better to create a
packaging/setup.py translation script than that (something like 2to3). It's
not about coloring/highlighting, nor about editors/IDEs features, it's
about seeing the object/file itself, and colors never change that AFAIK.
Also, most code I read isn't using my editor, sometimes it comes from
cat/diff (terminal stdout output), vim/gedit/pluma (editor),
GitHub/BitBucket (web), blogs/forums/e-mails, gitk, Spyder (IDE), etc..
That kind of "view" replacement would compromise some code alignment (e.g.
multiline strings/comments) and line length, besides being a problem to
look for code with tools like find + grep/sed/awk (which I use all the
time). Still worse are the git hooks to perform the replacement
before/after a commit: how should one test a code that uses that? It
somehow feels out of control.

[1] https://docs.python.org/3/reference/lexical_analysis.html
[2] http://www.slideshare.net/djsbellini/20140416-garoa-hc-strategy

2016-07-20 13:44 GMT-03:00 Stephen J. Turnbull <
turnbull.stephen.fw at u.tsukuba.ac.jp>:

> Nick Coghlan writes:
>
>  > The reason that can help is that the main problem with "improving"
>  > error messages, is that it can be really hard to tell whether the
>  > improvements are actually improvements or not
>
> Personally, I think the real issue here is that the curly quote (and
> things like mathematical PRIME character) are easily confused with
> Python syntax and it all looks like grit on Tim's monitor.  I tried
> substituting an emoticon and the DOUBLE INTEGRAL, and it was quite
> obvious what was wrong from the Python 3 error message.<wink/>
>
> However, in this case, as far as I can tell from the error messages
> induced by playing with ASCII, Python 3.5 thinks that all non-
> identifier ASCII characters are syntactic (so for example it says that
>
>     with open($file.txt") as f:
>
> is "invalid syntax").  But for non-ASCII characters (I guess including
> the Latin 1 set?) they are either letters, numerals, or just plain not
> valid in a Python program AIUI (outside of strings and comments, of
> course).
>
> I would think the lexer could just treat each invalid character as an
> invalid_token, which is always invalid in Python syntax, and the error
> would be a SyntaxError with the message formatted something like
>
>     "invalid character {} = U+{:04X}".format(ch, ord(ch))
>
> This should avoid the strange placement of the position indicator,
> too.
>
> If someday we decide to use an non-ASCII character for a syntactic
> purpose, that's a big enough compatibility break in itself that
> changing the invalid character set (and thus the definition of
> invalid_token) is insignificant.
>
> I'm pretty sure this is what a couple of earlier posters have in mind,
> too.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Danilo J. S. Bellini
---------------
"*It is not our business to set up prohibitions, but to arrive at
conventions.*" (R. Carnap)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160720/fda68345/attachment.html>


More information about the Python-ideas mailing list