[Python-ideas] allow `lambda' to be spelled λ

Steven D'Aprano steve at pearwood.info
Thu Jul 21 10:25:37 EDT 2016

On Wed, Jul 20, 2016 at 06:16:10PM -0300, Danilo J. S. Bellini wrote:

> 1. Using SyntaxError for lexical errors sounds as strange as saying a
> misspell/typo is a syntax mistake in a natural language.

Why? Regardless of whether the error is found by the tokeniser, the 
lexer, the parser, or something else, it is still a *syntax error*. Why 
would the programmer need to know, or care, what part of the 
compiler/interpreter detects the error?

Also consider that not all Python interpreters will divide up the task 
of interpreting code exactly the same way. Tokenisers, lexers and 
parsers are very closely related and not necessarily distinct. Should 
the *exact same typo* generate TokenError in one Python, LexerError in 
another, and ParserError in a third? What is the advantage of that?

> 2. About those lexical error messages, the caret is worse than the lack of
> it when it's not aligned, but unless I'm missing something, one can't
> guarantee that the terminal is printing the error message with the right
> encoding. Including the row and column numbers in the message would be
> helpful.

It would be nice for the caret to point to the illegal character, but 
it's not *wrong* to point past it to the end of the token that contains 
the illegal character.

> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. Ther

Not really. Look at their names:


Personally, I don't understand why the Unicode Consortium has included 
all these variants. But whatever the reason, the names hint strongly 
that they have specialised purposes, and shouldn't be used when you want 
the letter Σ.

But, if you do, Python will normalise them all to Σ, so there's no real 
harm done, except to the readability of your code.

> when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.

You are assuming that the programmer's font includes glyphs for all of 
six of those code points. More likely, the programmer will see Σ for the 
first code point, and the other five will display as a pair of "missing 
glyph" boxes. (That's exactly what I see in my mail client, and in the 
Python interpreter.)

Why a pair of boxes? Because they are code points in the Supplementary 
Multilingual Planes, and require *two* 16-bit code units in UTF-16. So 
naive Unicode software with poor support for the SMPs will display two 
boxes, one for each surrogate code point.

Even if the code points display correctly, with distinct glyphs, your 
comment that most people will be forced to copy and paste the symbol is 
precisely why I am reluctant to see Python introduce non-ASCII keywords 
or operators. It's a pity, because I think that non-ASCII operators at 
least can make a much richer language (although I wouldn't want to see 
anything as extreme as APL). Perhaps I will change my mind in a few more 
years, as the popularity of emoji encourage more applications to have 
better support for non-ASCII and the SMPs.

> 6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having
> Unicode keywords is merely contingent on Python 2 behavior that emphasized
> ASCII-only code (besides comments and strings).

No, it is a *policy decision*. It is not because Python 2 didn't support 
them. Python 2 didn't support non-ASCII identifiers either, but Python 3 
intentionally broke with that.

> 7. The discussion isn't about lambda or anti-lambda bias, it's about
> keyword naming and readability. Who gains/loses with that resource? It
> won't hurt those who never uses lambda and never uses Unicode identifiers.

It will hurt those who have to read code with a mystery λ that they 
don't know what it means and they have no idea how to search for it. At 
least "python lambda" is easy to search for.

It will hurt those who want to use λ as an identifier. I include myself 
in that category. I don't want λ to be reserved as a keyword.

I look at it like this: use λ as a keyword makes as much sense as making 
f a keyword so that we can save a few characters by writing:

f myfunction(arg, x, y):

instead of def. I use f as an identifier in many places, e.g.:

for f in list_of_functions:

or in functional code:

compose(f, g)

Yes, I can *work around it* by naming things f_ instead of f, but that's 
ugly. Even though it saves a few keystrokes, I wouldn't want f to be 
reserved as a keyword, and the same goes for λ as lambda.

> 8. I don't know if any consensus can emerge in this matter about lambdas,
> but there's another subject that can be discussed together: macros. 

I'm pretty sure that Guido has ruled "Over My Dead Body" to anything 
resembling macros in Python.

However, we can experiment with adding keywords and macro-like 
facilities without Guido's permission. For example:


It's a joke, of course, but the technology is real.

Imagine, if you will, that somebody you could declare a "dialect" at the 
start of Python modules, just after the optional language cookie:

# -*- coding: utf-8 -*-
# -*- dialect math -*-

which would tell importlib to run the code through some sort of 
source/AST transformation before importing it. That will allow us to 
localise the keywords, introduce new operators, and all the other things 
Guido hates *wink* and still be able to treat the code as normal Python.

A bad idea? Probably an awful one. But it's worth experimenting with it,
It will be fun, and it *just might* turn out to be a good idea.

For the record, in the 1980s and 1990s, Apple used a similar idea for 
two of their scripting languages, Hypertalk and Applescript, allowing 
users to localise keywords. Hypertalk is now defunct, and Applescript 
has dropped that feature, which suggests that it is a bad idea. Or maybe 
it was just ahead of its time.


More information about the Python-ideas mailing list