[Python-3000] PEP 3131 accepted

Wed May 23 13:18:59 CEST 2007

On Wed, 23 May 2007, Ian D. Bollinger wrote:
> Ka-Ping Yee wrote:
> >     2.  Python will become vulnerable to a new class of security
> >         exploits via the writing of misleading or malicious code
> >         that is visually indistinguishable from correct code.
> >         Consequently it will be more difficult for humans to
> >         inspect code and assure its correctness or trustworthiness.
> >         There is very little established best practice for
> >         addressing homograph security issues.
> >
> Isn't it already easy enough to do that today?

There are two simultaneous errors in reasoning here.  First, the fact
that one can write confusing code today is not a reason to enable the
writing of even more confusing code.

Second, the Unicode identifier issue is different from the example you
give here.  In your example, it is obvious that the code is doing
something hard to understand; if I showed you something like this and
asked you what it did, you would think "hmm, that looks obfuscated":

>  >>> import base64; exec
> base64.decodestring('cHJpbnQgJ0hlbGxvLCB3b3JsZCEn\n')
> ... Hello, world!

But with Unicode identifiers you have no way to know even whether you
should be suspicious.  You would feel confident that you know what
a simple piece of code does, and yet be wrong.  For example, this
looks like a normal fragment of code:

    def remove_if_allowed(user, filename):
        allow = 1
        for group in disabled_groups:
            if user in group:
                allow = 0
        if allow:
            os.remove(filename)

But there is no way to tell by looking at it whether it works or not.
If all three occurrences of 'allow' are spelled with ASCII characters,
it will work.  If the second occurrence of 'allow' is spelled with a
Cyrillic 'a' (U+0430), you have a silent security hole.

Now imagine that this is part of an open-source project that accepts
patches from the community, and senior developers check in the patches
after reviewing them.  The use of Unicode identifiers opens the door
for someone to introduce a security hole that is guaranteed to be
undetectable by reading the code, no matter how carefully anyone reads it.

Will this be caught?  Maybe someone will test the routine; maybe not.
Either way, it is clear that the reviewer's job has just gotten much
more difficult, and accepting patches is much more dangerous as a
result of PEP 3131.

-- ?!ng