[Python-3000] Conservative Defaults (was: Re: Support for PEP 3131)

Mon Jun 4 16:05:13 CEST 2007

On 6/3/07, BJörn Lindqvist <bjourne at gmail.com> wrote:

[Most deleted, Stephen Turnbull already answered better than I knew,
let alone could write]

> > The same one-step-at-a-time reasoning applies to unicode identifers.
> > Allowing IDs in your native language (or others that you explicitly
> > approve) is probably a good step.  Allowing IDs in *any* language by
> > default is probably going too far.

> If you set different native languages won't you get the exact same
> problems that codepages caused and that unicode was invented to solve?

Not at all; if anything, it is the opposite.

(1)  Those different code pages were mainly used for text, not
programming logic.  No one has suggested (re-)limiting comments or
even (continuing to limit) strings.

(2)  The biggest problem that I saw in practice was partial overlap;
people would assume WYSIWYG, and the different code pages were close
enough (mostly overlapping in ASCII) that they didn't usually need to
use the same code page -- but then when the differences did bite, they
were harder to notice.

If you happen to use both Sanskrit and Ethiopic, you can set your own
computer to accept both.  The only catch is that you probably can't
share the Sanskrit with the Coptic community (or vice versa), unless
at least one of the following is true:

    (2a)  The code itself (not comments or strings) is in ASCII, so
both can read it.  Note that this is already the recommened policy for
shared code.

or (2b)  The people you are sharing with trust you enough to add your
script as an acceptable alternate.  (Again, preferably a simple
one-time step -- but an explicit decision.)

or (2c)  The people you are sharing with have already decided to
accept Sanskrit (or Coptic) because other people they trusted were
using it, and said it was safe.

The existence of 2b and 2c rely on the "consenting adults" policy, but
they encourage "informed consent".  I wouldn't be surprised to
discover that Latin-1, Sanskrit, Coptic, and the Japanese characters
were all OK with me.

That still wouldn't mean I want to allow Cyrillic (which carries more
confusable risk).

I already know I don't want to auto-allow the FF10-FF19 (fullwidth
ASCII numbers[1]), simply because I don't see any good
(non-presentational) reason to use them in place of the normal ASCII
numbers -- so the more likely result of using them is confusion.

Adding one script (or character range) at a time lets me add things
that I (or people I trust) think are reasonable.  Turning unicode on
or off with a single blunt switch does not.

-jJ

[1]  Yes, the fullwidth ASCII variants are allowed as ID characters
according to both the unicode ID_* and XID_ properties, which means
they are allowed by the current draft.