[Python-ideas] Allow additional separator character in variables

Wed Nov 22 22:46:04 EST 2017

On 21 November 2017 at 21:55, Stephen J. Turnbull <
turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:

> Personally, I think that Python probably should ban non-ASCII
> non-letter characters in identifiers and whitespace, and maybe add
> them later in response to requests from native speakers of the
> relevant languages.  I don't know how easy that would be to do,
> though, since I think the rule is already that identifiers must be
> composed only of letters, numbers, and ASCII "_".  Since Serhiy's
> examples are valid, we'd have to rule them out explicitly, rather than
> by reference to the Unicode database.  Yuck.
>

We're not going to start second-guessing the Unicode Consortium on this
point - human languages are complicated, and we don't have any special
insight on this point that they don't.
https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes
delegated this aspect of the language to them by way of the XID_Start and
the XID_Continue categories, and we're not going to change that.

Any hybrid Python 2/3 application or library is necessarily restricted to
ASCII-only identifiers, since that's all that Python 2 supports.

We've also explicitly retained the ASCII-only restriction for PyPI
distribution names (see https://www.python.org/dev/peps/pep-0508/#names),
but that doesn't restrict the names used for import packages, only the
names used to publish and install those components. If we ever decide to
lift that restriction, it will likely be by way of
https://en.wikipedia.org/wiki/Punycode, similar to the way
internationalized domain names work, as well as the way multi-phase
extension module initialization locates init functions for extension
modules with non-ASCII names.

Beyond that, I'll note that these questions were all raised in the original
PEP: https://www.python.org/dev/peps/pep-3131/#open-issues

The reference interpreter really isn't the place to experiment with
answering them - rather, they're more a question for opt-in code analysis,
since that makes it possible for folks to choose settings that are right
*for them* (e.g. by defining a set of "permitted scripts" [1], specifying
the Unicode characters that should be allowed in identifiers beyond the
core set of "Latin" code points allowed by ASCII)

Cheers,
Nick.

[1] https://en.wikipedia.org/wiki/Script_(Unicode)

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171123/7fe09065/attachment-0001.html>