[Python-3000] Support for PEP 3131

Stephen J. Turnbull stephen at xemacs.org
Sun May 27 14:59:08 CEST 2007


James Y Knight writes:

 > If the identifier syntax is changed to include unicode, all python  
 > modules are still usable everywhere. Once you start going down the  
 > road of configurable syntax (worse: globally configurable syntax),

The syntax is not "configured", it is "audited".  Just like Unix
passwords, which can be anything in principle, but most distros audit
them (unless assigned by root).

Now, Ka-Ping Yee and Josiah Carlson clearly would like to see the
restriction in the language.  That's not where I'm going.  I see PEP
3131 as defining the language.

However, I do think that a limited amount of *optional* auditing *in
the Python compiler* would be a good idea to have, especially for
Americans who (along with everybody else) have *no* need for Unicode
identifier support now, and are not going to have a need for a long
time on average.  Better they should get a heads-up when the Klingons
arrive.

 > there will be a "second class" of python modules that won't work on  
 > some systems without extra pain.

That's right.  It's all modules that contain non-ASCII identifiers,
because by PEP 3131 they cannot be distributed with Python as part of
the standard library.

The question is how much extra pain, and will it actually hinder u

 > It started with a simple "-U", grew into a "-U <language>", grew into

Actually, it started with plugging into the codec interface, with
"ASCII-only" and "PEP 3131" auditors available by default.

 > a 'pyidchar.txt' file with a list of character ranges, and now that  
 > pyidchar.txt file is going to have separate sections based on module  
 > name? Sorry, but are you !@# kidding me?!?

The scalability issue was raised by Guido, not the ASCII advocates.

To answer how I view this, no, I'm not kidding.  Until the vaporware
auditing programs get fieldtested, and we've actually seen a couple of
exploits of unwary sites and discover that they're the ones the
auditing programs already catch, not something unexpected.

In any case, I expect that the most commonly used version of that file
will look like

    [DEFAULT]
    000000-1FFFFF    # all of Unicode as restricted by PEP 3131

    # pyidchar.txt ends here

Anything more complicated than that is a convenient standardized
format for filters that can be shared among the seriously paranoid.



More information about the Python-3000 mailing list