[Python-3000] Support for PEP 3131

Fri May 25 11:36:28 CEST 2007

On Thu, 24 May 2007, Guido van Rossum wrote:
> If there's a security argument to be made for restricting the alphabet
> used by code contributions (even by co-workers at the same company), I
> don't see why ASCII-only projects should have it easier than projects
> in other cultures.

This keeps getting characterized as only a security argument, but
it's much deeper; it's a basic code comprehension issue.  It's all
five of the issues I mentioned at

    http://mail.python.org/pipermail/python-3000/2007-May/007855.html

and the additional point about Unicode standards raised by Jim at

    http://mail.python.org/pipermail/python-3000/2007-May/007863.html

I still believe all of these should at least be acknowledged in the PEP.

----

If you like, you could look at this as trying to serve two different
communities, the "ASCII folks" and the "non-ASCII folks", as has been
said in other messages here.  (IMHO, it would be better to think of
many different communities of non-ASCII folks rather than just one,
which is why the choose-your-own-table solution makes the most sense.)

But suppose we just look at the simpler question of "what should the
default be?" -- there are two possible behaviours; which should the
default favour?  All these decision criteria agree:

  - Explicit or implicit?  Better to explicitly enable the new feature.

  - Simple or complex?  ASCII is the simpler character set.

  - Majority or minority?  By far the majority will use only ASCII.

  - Status quo or new behaviour?  ASCII is established and familiar.

The safer choice is to stick to ASCII by default.  There's nothing to
lose by doing so.  Why rush to change the lexical syntax?  Why is it
*necessary* to do it right now, and all at once, and by default?

----

> A more useful approach would seem to be a set of auditing tools that
> can be applied routinely to all new contributions (e.g. as a
> pre-commit hook when using a source control system), or to all code in
> a given directory, download, etc. I don't see this as all that
> different from using e.g. PyChecker of PyLint.
[...]
> Scanning for stray non-ASCII characters is best
> left to automated tools.

...like the Python interpreter.  Having the Python interpreter do this
is a good idea for all the same reasons that the Python interpreter
checks for tab/space inconsistency.

Imagine a parallel universe in which Python has always forbidden
tabs and only allowed spaces for indentation.  In Python 3.0, it is
proposed to introduce tabs.  Alter-Guido announces he will accept
the proposal.  Some folks are opposed to adding tabs, saying it could
be confusing, but he disagrees.  Some folks suggest that this feature
could at least be made optional, but he disagrees.  Some folks suggest
that the Python interpreter should at least warn when this happens,
but he disagrees.

"But," they say, "mixing tabs and spaces can yield programs that have
invisibly different meanings.  "No matter," says alter-Guido, "you
just shouldn't do that."  Or "You should use an editor that takes care
of this for you."  Or "You need to write your own checking tools and
scan all your code before you check it in."  "But what about all the
users who aren't aware of this change?" they ask.

Wouldn't it just be so much easier if the Python interpreter did the
checking?  In our universe, it does, and this is a very good thing.
Why did we decide to do that?

I would say, becuase it makes our programs more reliable, and it
means we have less to worry about when we're coding.  Is it a
"security issue"?  You could call it that, but really it's just a
sanity issue.

-- ?!ng