Well, I finally ran into a Python Unicode problem, sort of
Chris Angelico
rosuav at gmail.com
Sun Jul 3 03:41:59 EDT 2016
On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky <john_ladasky at sbcglobal.net> wrote:
> Up until today, every character I've tried has been accepted by the Python interpreter as a legitimate character for inclusion in a variable name. Now I'm copying a formula which defines a gradient. The nabla symbol (∇) is used in the naming of gradients. Python isn't having it. The interpreter throws a "SyntaxError: invalid character in identifier" when it encounters the ∇.
>
> I am now wondering what constitutes a valid character for an identifier, and how they were chosen. Obviously, the Western alphabet and standard Greek letters work. I just tried a few very weird characters from the Latin Extended range, and some Cyrillic characters. These are also fine.
>
Very good question! The detaily answer is here:
https://docs.python.org/3/reference/lexical_analysis.html#identifiers
> A philosophical question. Why should any character be excluded from a variable name, besides the fact that it might also be an operator?
>
In a way, that's exactly what's happening here. Python permits certain
categories of character as identifiers, leaving other categories
available for operators. Even though there aren't any non-ASCII
operators in a vanilla CPython, it's plausible that someone could
create a Python-based language with more operators (eg ≠ NOT EQUAL TO
as an alternative to !=), and I'm sure you'd agree that saying "≠ = 1"
is nonsensical.
> This might be a problem I can solve, I'm not sure. Is there a file that the Python interpreter refers to which defines the accepted variable name characters? Perhaps I could just add ∇.
>
The key here is its Unicode category:
>>> unicodedata.category("∇")
'Sm'
You could probably hack CPython to include Sm, and maybe Sc, Sk, and
So, as valid identifier characters. I'm not sure where, though, and
I've just spent a good bit of time delving (it's based on the
XID_Start and XID_Continue derived properties, but I have no idea
where they're defined - Tools/unicode/makeunicodedata.py looks
promising, but even there, I can't find it). And - or maybe instead -
you could appeal to the core devs to have the category/ies in question
added to the official Python spec. Symbols like that are a bit of a
grey area, so you may find that you're starting a huge debate :)
Have fun.
ChrisA
More information about the Python-list
mailing list