[Python-Dev] Divorcing str and unicode (no more implicitconversions).

Wed Oct 26 01:59:51 CEST 2005

"Martin v. Löwis" <martin at v.loewis.de> wrote:
> 
> Josiah Carlson wrote:
> > And how users could say, "name error? But I typed in window.draw(PEN) as
> > I was told to, and it didn't work!"
> 
> Ah, so the "serious issues" you are talking about are not security 
> issues, but usability issues.

Indeed, it was a misunderstanding, as the email stated:
    I did not mean to imply that I was concerned about the security
    implications of inserting arbitrary identifiers in Python (I was
    mentioning the web browser case for an example of how such
    characters have been confusing previously), I am concerned about
    confusion involved with using: [glyphs which are identical]

> I don't think extending the range of acceptable characters will
> cause any additional confusion. Users are already getting "surprising"
> NameErrors/AttributeErrors in the following cases:
> - they just misspell the identifier, and then, when the error message
>    is printed, fail to recognize the difference, as they read over the
>    typo just like they read over it when mistyping it in the first place.

In this case it's not just a misreading, the characters look identical! 
When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
characters will or will not be used as identifiers, when those
characters are keys on a keyboard of a specific type, is pretty
presumptuous.

> - they run into confusions with different things having the same names
>    in different contexts. For example, they wonder why they get TypeError
>    for passing the wrong number of arguments to a function, when the
>    call matches exactly what the source code in front of them tells
>    them - only that they were calling a different function which just
>    happened to have the same name.

Right, and users should be reading the documentation for the functions
and methods they are calling.

> In the light of these common mistakes, your example with an identifier
> named PEN, where the "P" might be a cyrillic letter or the E a greek one
> is just made up: For window.draw, people will readily understand that
> they are supposed to use Latin letters. More generally, they will know
> what script to use just from looking at the identifier.

Sure, that example was made up, but there are words which have been
stolen from various languages by english, and you are discounting the
case of single-letter temporary variables.  Saying what will and won't
happen over the course of using unicode identifiers is quite the
prediction.

> > Identically drawn glyphs are a problem, and pretending that they aren't
> > a problem, doesn't make it so.  Right now, all possible name glyphs are
> > visually distinct
> 
> Not at all: Just compare Fool and Foo1 (and perhaps FooI)
> 
> In the font in which I'm typing this, these are slightly different - but
> there are fonts in which the difference is really difficult to
> recognize.

Indeed, they are similar, but_ different_ in my font as well.  The trick
is that the glyphs are not different in the case of certain greek or
cyrillic letters.  They don't just /look/ similar they /are identical/.

 - Josiah