
On Thu, 17 Mar 2022 at 23:19, Stéfane Fermigier <sf@fermigier.com> wrote:
The “correct” (according to Bourbaki) mathematical notation for an empty set is “∅" (aka Unicode U+2205, or HTML ∅)
Some time ago, for a project which had a lot of empty sets, I tried to use this symbol as a short hand for set(). But:
⦰ = set() File "<stdin>", line 1 ⦰ = set() ^ SyntaxError: invalid character '⦰' (U+29B0) ø = set()
In other words, “⦰” is illegal as an identifier in Python (same for ⌀ aka U+2300 DIAMETER SIGN), but “ø” (aka U+00F8 LATIN SMALL LETTER O WITH STROKE) is legal !
So I used "⌀" instead of “⦰”, but I eventually dropped the whole idea because, IIRC, some tools weren’t too happy with it.
Still, I guess it wouldn’t be neither too hard nor two disruptive to accept “⦰” as well as some other mathematical characters as identifiers in Python.
unicodedata.category("⦰") 'Sm'
https://www.fileformat.info/info/unicode/category/Sm/list.htm This is the "Symbol, math" category. Python's support for characters in identifiers is, apart from some compatibility rules to ensure that treatment of ASCII hasn't changed since Py2, based on these categories, and this one is primarily composed of what we would call symbols, not letters (if you prefer, they're more like "punctuation" than "words"). https://docs.python.org/3/reference/lexical_analysis.html#identifiers Supporting these in identifiers is fundamentally incompatible with supporting them as literals, with the exception of keywords, which always represent specific values (for instance, True does not mean "construct a new boolean object with the value True", it means "use the existing instance of True"). Since an empty set needs to be constructed every time, using it as an identifier seems backwards; it would be more useful to define it as a literal instead. There are problems with creating non-ASCII literal forms, but I believe fewer than with allowing symbols as identifiers. ChrisA