[Python-3000] Support for PEP 3131

Adam Olsen rhamph at gmail.com
Fri May 25 20:16:46 CEST 2007

On 5/25/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 5/25/07, Adam Olsen <rhamph at gmail.com> wrote:
> > If we allowed an underscore as a mixed-script separator
> > (allowing "def get_原料(self):"), does this let us get away
> > with otherwise banning mixed-scripts?
> I wondered that, until seeing that it wouldn't really solve the
> problem anyhow.  It is possible to write entire words (such as "allow"
> or "scope") in multiple scripts.  (Unicode calls these "whole script
> confusables".)  You can't stop that without banning one of the scripts
> entirely, which would disenfranche users of some languages.
> So I think the least-bad solution is to say "OK, we won't allow these
> potentially confusable characters unless you were expecting them."
> And once we have a way to say "I'm expecting Cyrillic", we might as
> well let the user specify exactly what they're expecting, and make
> their own decisions on what it likely to be needed vs likely to be
> confused.

Indeed, the whole-script confusables does create significant holes,
but I think the best solution is still to ban mixed-scripts and accept
that it's only a "75% solution".  Using an "I'm expecting cyrillic"
flag makes it harder for those who need cyrillic AND still leaves them
vulnerable to the same problem we're trying to protect ourselves from.

A more extreme solution would be to introduce a symbol type that
converts that converts whole-script confusables to a canonical form
(as well as mixed-script confusables, if we don't ban them).  For
practically it would have to coerce any unicode it was compared with
for equality.. and probably not support sorting.

Adam Olsen, aka Rhamphoryncus

