[Python-3000] Support for PEP 3131

James Y Knight foom at fuhm.net
Thu May 24 23:47:45 CEST 2007

On May 24, 2007, at 5:04 PM, Ka-Ping Yee wrote:
>> (1)  By default, python allows only ASCII.
>> (2)  Additional characters are permitted if they appear in a table
>> named on the command line.
> +1!  This is a fine solution.  It is better than the "python -U"
> option I proposed -- it has all the advantages of that proposal, plus:
>     - The identifier character set won't spontaneously change when
>       one upgrades to a new version of Python, even for users of
>       non-ASCII identifiers.

FUD. Already won't, unicode explicitly makes that promise. They can  
add characters, but not remove them.

>     - Having to specify the table of acceptable characters
>       demonstrates at least some knowledge of the character set
>       one is using.

This is a negative. Why should I have to show knowledge of the  
character set I'm using to type the characters?

>     - It provides the flexibility for different communities to
>       to adopt identifier conventions that suit their preferred
>       tradeoff of risk vs. expressiveness.

Also a negative. Now, if I want to run the modules from multiple  
communities I need to figure out how to merge the tables they have to  
separately distribute with their modules.

> Jim's proposal appears to be the best path to making everyone happy.

Nope. It does nobody any good. It may make people who fear non-ascii  
code happy, but only because it totally castrates this feature for  
people who do want to use non-ascii identifiers.

It really seems to me people are spewing a lot of FUD here. Rejecting  
certain characters when loading a file is simply not necessary.


a) you trust that the author of the file has authored it correctly,  
in which case it doesn't matter one bit what character set they used.  
Restricting the charset at import time is just something to get in  
your way with no actual value.

b) you don't trust the code, and want to inspect it.

Okay, in this case you actually have to inspect the *code* --  
checking the character set is an utterly useless thing to do by  
itself. It tells you nothing useful.

While checking the code, you may want to have strange characters  
outside your comfort range flagged for you. Either grep or editor  
support are a simple enough solution for this. Or, let's say your  
editor is unable to highlight suspicious characters, and you want to  
find identifiers with strange characters, and not get tripped up on  
comments. Fine, make a tool that uses the compiler.parser module to  
iterate over identifiers in the source code.

Adding baroque command line options for users of other languages to  
do some useless verification at import time is not an acceptable  
answer. It'd be better to just reject the PEP entirely.


More information about the Python-3000 mailing list