[Python-ideas] Allow additional separator character in variables

Mikhail V mikhailwas at gmail.com
Sun Nov 19 19:01:59 EST 2017

On Sun, Nov 19, 2017 at 5:16 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 19 November 2017 at 13:22, Mikhail V <mikhailwas at gmail.com> wrote:
>> For me, one "cheap" solution against underscores is to use
>> syntax highlighting which grays them out, but if those become like
>> spaces, then it becomes a bit confusing, e.g. in function with many arguments.
>> Also, unfortunately, not many editors allow easy (if any) highlighting
>> customisation on that level.
> Changing the way editors display underscore-using variable names still
> seems like a more productive direction to explore than changing the
> text encoding read by the compiler.

Indeed that would be a solution. *Would* be. But I don't know of
any editor that does that afaik (and they should not in this case, see below).

My view on pros&cons for this solution:

Pros: other languages also have the same issue, so if editors maintainers
would agree to compromise and introduce feature of dynamic substitution,
that would give users possibility to face-lift other syntaxes as well.

Cons: this feature would make sense if the substitution happens
only in those part where it should, namely it should not touch anything
in string literals, comment blocks. So the lexer should 'know' where
to substitute or not and it is not the same as just passing the internal memory
representation through a translation table.

My opinion about this however is based on other principles.
Imagine that you are the language designer and I am responsible
for the typesetting component of some editor, and we have such a dialogue:

you: "hey Mikhail, we use hyphen for minus operator, now can you please
patch the  renderer so that our users see the minus instead of hyphen,
and please make sure users can also toggle it in real time to see what actual
char is there and also make the substitution only in the places where hyphen
is used as the operator."

me: "well, I understand your complain, but my renderer already supports
Unicode, and I do my best to support typography practices, namely render
hyphen as *hyphen*, which is well established for centuries in typography,
and defined as a dash of 50% width of the letter "o" and is aligned to
As well as the Minus glyph which is defined as ca. 110% of "o" width
and is aligned
to the digits&caps.
So you as the language designer should be interested to deliver best
practices to
the users, and hyphen is way more important for the lexical structure
of the written language,
than the minus operator. Why would not you just try to solve the issue
in a "fair" way?"

By the fair way I understand the way which tends to bring the correct usage of
characters back, instead of trying to hide the problem with some patch.
Now I can't say what is the least problematic way for Python, but if I were
responsible for that, I would base the solution on these principles:

1. The future versions of syntax, ideally, must allow ONLY minus U2212 for
the minus operator, and allow hyphens 002D in identifiers. Since it is
to the current moment, I must think out the least painful transition.
2. I want users to be able to use underscore as well. Underscore is derived
from the mechanical type-writers - to make an underlined text one pushed the
carriage back and tipped the underscore to make the line under the text.
Currently in digital print it does not make much sense and as a separator looks
ugly, but still it not so hopeless. Currently the underscore lies
below the font baseline
but if one makes it closer to the baseline, then it can be used as a
fairly adequate additional
separator, so a user would become more ways to denote lexical identifiers.
3. I don't want to break the backward-compatibility but still I am oriented on
compliance with typography practices and standards for charcodes. Also I want
users who are interested in better UX become the benefits
out-of-the-box, without
forcing them to tweak the text-editors or writing own translators.

What to do? One option IMO would be to introduce a header in the sources, e.g.:
# opt-in: hyphen-minus

Which would tell the parser to toggle the "new" rules, namely U+2212 would be
parsed as minus operator and hyphens as part of identifiers.
Then users who are aware of benefits and remember monospaced fonts only
as unpleasant incident from their youth, can enjoy the beauty of source code
without any tweaks, and the only thing they need to do is to bind a key to input
the U+2212 sign.

The users who do not want it, just leave this out. Further, I'd add a
util that can directly translate to the "old" syntax, in case one want
to export a project
in old syntax. So one could avoid backward compatibility issue.

That is just one option that comes to my mind.

Another thing which might be important in this regard:
Say you want to publish a book about Python. With such syntax you could
directly import the code into a DTP software, and  you don't need to
make any corrections, so it looks almost as a normal English
text, and no worries about strange looking minus operators.


More information about the Python-ideas mailing list