[Python-3000] Support for PEP 3131
Adam Olsen
rhamph at gmail.com
Fri May 25 20:16:46 CEST 2007
On 5/25/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 5/25/07, Adam Olsen <rhamph at gmail.com> wrote:
> > If we allowed an underscore as a mixed-script separator
> > (allowing "def get_原料(self):"), does this let us get away
> > with otherwise banning mixed-scripts?
>
> I wondered that, until seeing that it wouldn't really solve the
> problem anyhow. It is possible to write entire words (such as "allow"
> or "scope") in multiple scripts. (Unicode calls these "whole script
> confusables".) You can't stop that without banning one of the scripts
> entirely, which would disenfranche users of some languages.
>
> So I think the least-bad solution is to say "OK, we won't allow these
> potentially confusable characters unless you were expecting them."
>
> And once we have a way to say "I'm expecting Cyrillic", we might as
> well let the user specify exactly what they're expecting, and make
> their own decisions on what it likely to be needed vs likely to be
> confused.
Indeed, the whole-script confusables does create significant holes,
but I think the best solution is still to ban mixed-scripts and accept
that it's only a "75% solution". Using an "I'm expecting cyrillic"
flag makes it harder for those who need cyrillic AND still leaves them
vulnerable to the same problem we're trying to protect ourselves from.
A more extreme solution would be to introduce a symbol type that
converts that converts whole-script confusables to a canonical form
(as well as mixed-script confusables, if we don't ban them). For
practically it would have to coerce any unicode it was compared with
for equality.. and probably not support sorting.
--
Adam Olsen, aka Rhamphoryncus
More information about the Python-3000
mailing list