[Python-3000] Support for PEP 3131

Ka-Ping Yee python at zesty.ca
Sat May 26 00:45:18 CEST 2007


On Thu, 24 May 2007, Stephen J. Turnbull wrote:
>  > You've got this backwards, and I suspect that's part of the root of
>  > the disagreement.  It's not that "when humans enter the loop they
>  > cause problems."  The purpose of the language is to *serve humans*.
[...]
> N.B. I take offense at your misquote.  *Humans do not cause problems.*
> It is *non-ASCII tokens* that *cause* the (putative) problem.  However,
> the alleged problems only arise when humans are present.

Oh, I apologize.  I misunderstood the antecedent of "they".

>  > The grammar has to be something a human can understand.
>
> There are an infinite number of ASCII-only Python tokens.  Whether
> those tokens are lexically composed of a small fixed finite alphabet
> vs. a large extensible finite alphabet doesn't change anything in
> terms of understanding the *grammar*.

I understand that you're talking about grammar as distinct from
lexical syntax -- I was using the word "grammar" to refer to everything.
I probably should have used the word "syntax" instead.

My point was just that you have to be able to tell what a token is before
you can read the syntax.  That's hard to do if you don't know what
characters are allowed and what characters aren't (and if there isn't
even a consensus on what should be allowed).

> The question is how expensive will the upgrade be, and what are the
> benefits.  My experience suggests that the cost is negligible *because
> most users won't use non-ASCII identifiers*, and they'll just stick
> with their ASCII-only tools.

That's exactly the danger.  It's a change that makes almost everyone's
tools and practices subtly, occasionally, and silently incorrect --
even unconsciously incorrect for many.  That's much worse than a
change that is obvious enough to force a correction in assumptions.

That just means, if we're going to provide this feature, we shouldn't
force subtle wrongness upon people by making it the default.  The
balance you're talking about weighs heavily in favour of ASCII by
default because that is what 100% of Python programs use now, it is
what the vast majority of Python programs will use in the future,
and it is what the vast majority of Python users will assume to be
the case for quite some time.

> And there are cases (Dutch tax law, Japanese morphology) where having
> a judicious selection of non-ASCII identifiers is very convenient.

Yes, granted.

>  > This should be built in to the Python interpreter and on by default,
>  > unless it is turned off by a command-line switch that says "I want to
>  > allow the full set of Unicode identifier characters in identifiers."
>
> I'd make it more tedious and more flexible to relax the restriction,
> actually.  "python" gives you the stdlib, ASCII-only restriction.
> "python -U TABLE" takes a mandatory argument, which is the table of
> allowed characters.  If you want to rule out "stupid file substitution
> tricks", TABLE could take the special arguments "stdlib" and "stduni"
> which refer to built-in tables.  But people really should be able to
> restrict to "Japanese joyo kanji, kana, and ASCII only" or "IBM
> Japanese only" as local standards demand, so -U should also be able to
> take a file name, or a module name, or something like that.

I strongly support this idea.  It's the best proposal I've heard so far.

>  > If we are going to allow Unicode identifiers at all, then I would
>  > recommend only allowing identifiers that are already normalized
>  > (in NFC).
>
> Already in the PEP.

The PEP says that Python will *convert* the identifiers into NFC.
I'd rather there not be lots of different ways to write the same
identifier (TOOWTDI), so this particular recommendation is that
identifiers in source code have to already be normalized.

>  > The ideas that I'm in favour of include:
>  >
>  >     (e) Use a character set that is fixed over time.
>
> The BASIC that I learned first only had 26 user identifiers.  Maybe
> that's the way we should go?<duck />

The solution you propose solves this nicely.


-- ?!ng


More information about the Python-3000 mailing list