[Python-Dev] Divorcing str and unicode (no more implicitconversions).

Greg Ewing greg.ewing at canterbury.ac.nz
Thu Oct 27 04:14:13 CEST 2005


Martin v. Löwis wrote:

> Not in the literal sense: you certainly want to allow
> "latin" digits in, say, a cyrillic identifier.

Yes, by "alphabet" I really only meant the letters,
although you might want to apply the same idea to
clusters of digits within an identifier, depending
on how potentially confusable the various sets of
digits are -- I'm not familiar enough with alternative
digit sets to know whether that would be a problem.

 > Just because
> you *can* come up with look-alike identifiers doesn't
> mean that people will use them, or that they will mistake
> the scripts (except for deliberately doing so, of
> course).

I still think this is a much worse potential problem
than that of "l" vs "1", etc. It's reasonable to
adopt the practice of never using "l" as a single
letter identifier, for example. But it would be
unreasonable to ban the use of "E" as an identifier
on the grounds that someone somewhere might confuse
it with a capital epsilon.

An alternative would be to identify such confusable
letters in the various alphabets and define them
to be equivalent.

And beyond the issue of alphabets there's also the
question of whether accented characters should be
considered distinct. I can see quite a few holy
flame wars erupting over that...

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+


More information about the Python-Dev mailing list