PEP 3131: Supporting Non-ASCII Identifiers

Sun May 20 19:34:13 EDT 2007

On May 17, 5:03 pm, "sjdevn... at yahoo.com" <sjdevn... at yahoo.com> wrote:
> On May 16, 6:38 pm, r... at yahoo.com wrote:
> > Are you worried that some 3rd-party package you have
> > included in your software will have some non-ascii identifiers
> > buried in it somewhere?  Surely that is easy to check for?
> > Far easier that checking that it doesn't have some trojan
> > code it it, it seems to me.
>
> What do you mean, "check for"?  If, say, numeric starts using math
> characters (as has been suggested), I'm not exactly going to stop
> using numeric.  It'll still be a lot better than nothing, just
> slightly less better than it used to be.

The PEP explicitly states that no non-ascii identifiers
will be permitted in the standard library.  The opinions
expressed here seems almost unamimous that non-ascii
identifiers are a bad idea in any sort of shared public
code.  Why do you think the occurance of non-ascii
identifiers in Numpy is likely?

> > > And I'm often not creating a stack trace procedure, I'm using the
> > > built-in python procedure.
> >
> > > And I'm often dealing with mailing lists, Usenet, etc where I don't
> > > know ahead of time what the other end's display capabilities are, how
> > > to fix them if they don't display what I'm trying to send, whether
> > > intervening systems will mangle things, etc.
> >
> > I think we all are in this position.  I always send plain
> > text mail to mailing lists, people I don't know etc.  But
> > that doesn't mean that email software should be contrainted
> > to only 7-bit plain text, no attachements!  I frequently use
> > such capabilities when they are appropriate.
>
> Sure.  But when you're talking about maintaining code, there's a very
> high value to having all the existing tools work with it whether
> they're wide-character aware or not.

I agree.  On Windows I often use Notepad to edit
python files.  (There goes my credibility! :-)
So I don't like tab-only indent proposals that assume
I can set tabs to be an arbitrary number of spaces.
But tab-only indentation would affect every python
program and every python programmer.

In the case of non-ascii identifiers, the potential
gains are so big for non-english spreakers, and (IMO)
the difficulty of working with non-ascii identifiers
times the probibility of having to work with them,
so low, that the former clearly outweighs the latter.

> > If your response is, "yes, but look at the problems html
> > email, virus infected, attachements etc cause", the situation
> > is not the same.  You have little control over what kind of
> > email people send you but you do have control over what
> > code, libraries, patches, you choose to use in your
> > software.
> >
> > If you want to use ascii-only, do it!  Nobody is making
> > you deal with non-ascii code if you don't want to.
>
> Yes.  But it's not like this makes things so horribly awful that it's
> worth my time to reimplement large external libraries.  I remain at -0
> on the proposal;

> it'll cause some headaches for the majority of
> current Python programmers, but it may have some benefits to a
> sizeable minority

This is the crux of the matter I think.  That
non-ascii identifiers will spead like a virus, infecting
program after program until every piece of Python code
is nothing but a mass of wreathing unintellagible non-
ascii characters.  (OK, maybe I am overstating a little. :-)

I (and I think other proponents) don't think this is
likely to happen, and the the benefits to non-english
speakers of being able to write maintainable code far
outweigh the very rare case when it does occur.

> and may help bring in new coders.  And it's not
> going to cause flaming catastrophic death or anything.