[Python-3000] Support for PEP 3131

Stephen J. Turnbull stephen at xemacs.org
Thu May 24 13:55:01 CEST 2007


Jim Jewett writes:

 > I would like an alert (and possibly an import exception) on any code
 > whose *executable portion* is not entirely in ASCII.

Are you talking about language definition or implementation?  I like
the idea of such checks, as long as they are not mandatory in the
language and can be turned off easily at run time in the default
configuration.  I'd also really like a generalization (described below).

 > > The only issues PEP 3131 should be concerned with *defining*
 > > are those that cause problems with canonicalization, and the range of
 > > characters and languages allowed in the standard library.
 > 
 > Fair enough -- but the problem is that this isn't a solved issue yet;

IMHO the stdlib *is* a solved issue.  The PEP says "in the standard
library, we use ASCII only, except in tests and the like," and "we use
English unless there is no reasonable equivalent in English."  That's
right.

AFAIK *canonicalization* is also a solved issue (although exactly what
"NFC" means might change with Unicode errata and of course with future
addition of combining characters or precombined characters).

The notion of "identifier constituent" is a bit thorny.  While in
general Cf characters don't belong in my understanding, there are some
weird references to ZWJ and ZWNJ that I don't understand in UAX#31.  I
say "leave them out until somebody named 'Bhattacharya' says 'Hey! I
need that!'"<wink>  In general, when in doubt, leave it out.

And prohibit it.  I think it's a very bad idea to give identifier
authors *any* control over their presentation to readers.  If an
editor has a broken or nonexistent bidi implementation, for example,
its user is probably used to that.  With *sufficient* breakage in a
presentation algorithm, I suppose that the same identifier could be
presented differently in different contexts, and that different
identifiers could be presented identically.  But that's not Python's
problem.  This can easily happen in ASCII, too.  (Consider an editor
that truncates display silently at column 80.)

 > Even having read their reports, my initial rules would still have
 > banned mixed-script, which would have prevented your edict-
 > example.

Urk.  I see your point (Ka-Ping's Cyrillic example makes it glaringly
clear why that's the conservative way to go).  I don't have to like
it, but I could live with it.  (Especially since "edict-" is a poor
man's namespace.  That device isn't needed in Python.)

 > > I propose it would be useful to provide a standard mechanism for
 > > auditing the input stream.  There would be one implementation for the
 > > stdlib ....  A second ....  A third, ....
 > 
 > This might deal with my concerns.  It is a bit more complicated than
 > the current plans.

Well, what I *really* want is a loadable table.  My motivation is that
I want organizations to be able to "enforce" a policy that is less
restrictive than "ASCII-only" but more restrictive than "almost
anything goes".  My students don't need Sanskrit; Guido's tax
accountant doesn't need kanji, and neither needs Arabic.  I think that
they should be able to get the same strict "alert or even import
exception" (that you want on non-ASCII) for characters outside their
larger, but still quite restricted, sets.


More information about the Python-3000 mailing list