[Python-3000] Support for PEP 3131

Stephen J. Turnbull stephen at xemacs.org
Fri May 25 13:13:03 CEST 2007


Jim Jewett writes:

 > Definition; I don't care whether it is a different argument to import
 > or a flag or an environment variable or a command-line option, or ...
 > I just want the decision to accept non-ASCII characters to be
 > explicit.

Ka-Ping's tricky.py shows that reliance on magic directives a la PEP
263 loses.  I agree with Martin that in practice most such hacks will
get caught in the ordinary process of editing, applying patches,
sending email, and the like, but if the compiler is going to do the
checking on behalf of the *user*, it should not rely on anything the
files say.

 > Ideally, it would even be explicit per extra character allowed, though
 > there should obviously be shortcuts to accept entire scripts.

How about a regexp character class as starting point?

 > So how about
 > 
 > (1)  By default, python allows only ASCII.

+1

But neither Martin nor Guido likes it, so I'm continuing to think
about it.  Martin's objection that people will try it and assume that
it's unimplemented smells like FUD to me, though.

 > (2)  Additional characters are permitted if they appear in a table
 > named on the command line.

+1

 > These additional characters should be restricted to code points larger
 > than ASCII (so you can't easily turn "!" into an ID char)

+1

You can specify any character you want, but if it's ASCII, or not in
the classes PEP 3131 ends up using to define the maximal set, it gets
deleted from the extension table (ASCII has its own table,
conceptually).  This permits whole scripts, blocks, or ranges to be
included.

Optionally warn on such deletions at load of the table (that would be
better a separate tool), but preferably when parsing the identifier
throw a SyntaxError

    """This character is in the table of extension characters for
    identifiers, but is of class Cf, which is forbidden in identifiers."""

 > If you want to include punctuation or

-1

Why waste the effort of the Unicode technical committees?

 > undefined characters, so be it.

-1

Assuming undefined == reserved for future standardization that
violates the Unicode standard.

-1 on private space characters

You *could* argue that a private space character could be valid within
a module, or an application of cooperating modules, but I don't think
it's worth trying to deal with it.  "I'm from Kansas, show me" (a use
case).



More information about the Python-3000 mailing list