[Python-3000] PEP 3131 accepted

Wed May 23 18:52:25 CEST 2007

Ka-Ping Yee wrote:
 > But with Unicode identifiers you have no way to know even whether you
 > should be suspicious.  You would feel confident that you know what
 > a simple piece of code does, and yet be wrong.

Also, Jim Jewett wrote:
 > Strings aren't a problem unless I evaluate them.

a = """This string has a triple quote and a command in it. \"""
os.remove("*")
"""

If that \ is merely a unicode character that looks like \, you've just 
deleted your harddrive.  (To close it off, you could use """, where the 
middle quote is a unicode character that looks like ".)  Two strings, 
with some executable code in the middle, that looks like one harmless 
string.

Actually, I think that could shorten down to:
a = """
os.remove("*")
"""
with the middle character of each """ not being a ".

My point here is that if you're confident that you know what a simple 
piece of code does, you're already wrong.  Unicode identifiers don't 
change that.

 > But there is no way to tell by looking at it whether it works or not.
 > If all three occurrences of 'allow' are spelled with ASCII characters,
 > it will work.  If the second occurrence of 'allow' is spelled with a
 > Cyrillic 'a' (U+0430), you have a silent security hole.

If you search for "allow", it'll only match the ones that actually 
match.  Yes, it makes patch reviewers jobs harder, or makes the tools 
they need to do their jobs need to be smarter.  No, I don't think it's 
as bad as you think it is.  And heck, if you're a patch reviewer, set 
the ASCII-only flag on your version of Python, or run a program before 
checking it in to flag non-ASCII characters, and reject all patches from 
that person in the future, since clearly they're a black hat.

Also, I find strangely amusing that complaints about characters that 
look the same as other characters come from someone named "?!ng".  :)

Later,
314|<3.