[Python-ideas] allow `lambda' to be spelled λ
Steven D'Aprano
steve at pearwood.info
Thu Jul 21 10:25:37 EDT 2016
On Wed, Jul 20, 2016 at 06:16:10PM -0300, Danilo J. S. Bellini wrote:
> 1. Using SyntaxError for lexical errors sounds as strange as saying a
> misspell/typo is a syntax mistake in a natural language.
Why? Regardless of whether the error is found by the tokeniser, the
lexer, the parser, or something else, it is still a *syntax error*. Why
would the programmer need to know, or care, what part of the
compiler/interpreter detects the error?
Also consider that not all Python interpreters will divide up the task
of interpreting code exactly the same way. Tokenisers, lexers and
parsers are very closely related and not necessarily distinct. Should
the *exact same typo* generate TokenError in one Python, LexerError in
another, and ParserError in a third? What is the advantage of that?
> 2. About those lexical error messages, the caret is worse than the lack of
> it when it's not aligned, but unless I'm missing something, one can't
> guarantee that the terminal is printing the error message with the right
> encoding. Including the row and column numbers in the message would be
> helpful.
It would be nice for the caret to point to the illegal character, but
it's not *wrong* to point past it to the end of the token that contains
the illegal character.
> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. Ther
Not really. Look at their names:
GREEK CAPITAL LETTER SIGMA
MATHEMATICAL BOLD CAPITAL SIGMA
MATHEMATICAL ITALIC CAPITAL SIGMA
MATHEMATICAL BOLD ITALIC CAPITAL SIGMA
MATHEMATICAL SANS-SERIF BOLD CAPITAL SIGMA
MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL SIGMA
Personally, I don't understand why the Unicode Consortium has included
all these variants. But whatever the reason, the names hint strongly
that they have specialised purposes, and shouldn't be used when you want
the letter Σ.
But, if you do, Python will normalise them all to Σ, so there's no real
harm done, except to the readability of your code.
[...]
> when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.
You are assuming that the programmer's font includes glyphs for all of
six of those code points. More likely, the programmer will see Σ for the
first code point, and the other five will display as a pair of "missing
glyph" boxes. (That's exactly what I see in my mail client, and in the
Python interpreter.)
Why a pair of boxes? Because they are code points in the Supplementary
Multilingual Planes, and require *two* 16-bit code units in UTF-16. So
naive Unicode software with poor support for the SMPs will display two
boxes, one for each surrogate code point.
Even if the code points display correctly, with distinct glyphs, your
comment that most people will be forced to copy and paste the symbol is
precisely why I am reluctant to see Python introduce non-ASCII keywords
or operators. It's a pity, because I think that non-ASCII operators at
least can make a much richer language (although I wouldn't want to see
anything as extreme as APL). Perhaps I will change my mind in a few more
years, as the popularity of emoji encourage more applications to have
better support for non-ASCII and the SMPs.
[...]
> 6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having
> Unicode keywords is merely contingent on Python 2 behavior that emphasized
> ASCII-only code (besides comments and strings).
No, it is a *policy decision*. It is not because Python 2 didn't support
them. Python 2 didn't support non-ASCII identifiers either, but Python 3
intentionally broke with that.
> 7. The discussion isn't about lambda or anti-lambda bias, it's about
> keyword naming and readability. Who gains/loses with that resource? It
> won't hurt those who never uses lambda and never uses Unicode identifiers.
It will hurt those who have to read code with a mystery λ that they
don't know what it means and they have no idea how to search for it. At
least "python lambda" is easy to search for.
It will hurt those who want to use λ as an identifier. I include myself
in that category. I don't want λ to be reserved as a keyword.
I look at it like this: use λ as a keyword makes as much sense as making
f a keyword so that we can save a few characters by writing:
f myfunction(arg, x, y):
pass
instead of def. I use f as an identifier in many places, e.g.:
for f in list_of_functions:
...
or in functional code:
compose(f, g)
Yes, I can *work around it* by naming things f_ instead of f, but that's
ugly. Even though it saves a few keystrokes, I wouldn't want f to be
reserved as a keyword, and the same goes for λ as lambda.
> 8. I don't know if any consensus can emerge in this matter about lambdas,
> but there's another subject that can be discussed together: macros.
I'm pretty sure that Guido has ruled "Over My Dead Body" to anything
resembling macros in Python.
However, we can experiment with adding keywords and macro-like
facilities without Guido's permission. For example:
http://www.staringispolite.com/likepython/
It's a joke, of course, but the technology is real.
Imagine, if you will, that somebody you could declare a "dialect" at the
start of Python modules, just after the optional language cookie:
# -*- coding: utf-8 -*-
# -*- dialect math -*-
which would tell importlib to run the code through some sort of
source/AST transformation before importing it. That will allow us to
localise the keywords, introduce new operators, and all the other things
Guido hates *wink* and still be able to treat the code as normal Python.
A bad idea? Probably an awful one. But it's worth experimenting with it,
It will be fun, and it *just might* turn out to be a good idea.
For the record, in the 1980s and 1990s, Apple used a similar idea for
two of their scripting languages, Hypertalk and Applescript, allowing
users to localise keywords. Hypertalk is now defunct, and Applescript
has dropped that feature, which suggests that it is a bad idea. Or maybe
it was just ahead of its time.
--
Steve
More information about the Python-ideas
mailing list