[Python-Dev] Divorcing str and unicode (no more implicitconversions).

Thu Oct 27 11:09:04 CEST 2005

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>You even argued against having non-ASCII identifiers:
>>
>>http://mail.python.org/pipermail/python-list/2002-May/102936.html
> 
> 
> I see :-) It seems I have changed my mind since then (which
> apparently predates PEP 263).
> 
> One issue I apparently was worried about was the plan to use
> native-encoding byte strings for the identifiers; this I didn't
> like at all.
> 
> 
>>* Unicode identifiers are going to introduce massive
>>code breakage - just think of all the tools people use
>>to manipulate Python code today; I'm quite sure that
>>most of it will fail in one way or another if you present
>>it Unicode literals such as in "zähler += 1".
> 
> 
> True. Today, I think I would be willing to accept the
> code breakage: these tools had quite some time to update
> themselves to PEP 263 (even though not all of them have
> done so yet); also, usage of the feature would only spread
> gradually. A failure to support the feature in the Python
> proper would be treated as a bug by us; how tool providers
> deal with the feature would be their choice.

I was thinking of introspection and debugging tools.
These would then see Unicode objects in the namespace
dictionaries and this will likely break a lot of code -
much for the same reason you see code breakage now
if you let Unicode object enter the Python standard lib
without warning :-)

>>* People don't seem very interested in using Unicode
>>identifiers, e.g.
>>
>>  http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html
> 
> 
> True. However, I also suspect that lack of tool support
> contributes to that. For the specific case of Java,
> there is no notion of source encoding, which makes Unicode
> identifiers really tedious to use.
> 
> If it were really easy to use, I assume people would actually
> use it - atleast in some of the contexts, like teaching,
> where Python is also widely used.

Well, that has two sides: Of course, you'll always find
some people that will like a certain feature. The question
is what effects does it have on the rest of us.

Python has always put some constraints on programmers
to raise code readability, e.g. white space awareness.
Giving them Unicode identifiers sounds like a step
backwards in this context.

Note that I'm not talking about comments, string literal
contents, etc. - only the programming logic, ie. keywords
and identifiers.

>>Do you really think that it will help with code readability
>>if programmers are allowed to use native scripts for their
>>identifiers ?
> 
> 
> Yes, I do - for some groups of users. Of course, code sharing
> would be more difficult, and there certainly should be a policy
> to use only ASCII in the standard library. But within local
> groups, users would find understanding code easier if they
> knew what the identifiers actually meant.

Hmm, but why do you think they wouldn't understand the meaning of
ASCII versions of the identifiers ?

Note that using ASCII doesn't necessarily mean that you
have to use English as basis for the naming schemes of
identifiers.

>>If you are told to debug a program
>>written by say a Japanese programmer using Japanese identifiers
>>you are going to have a really hard time. Integrating such
>>code into other applications will be even harder, since you'd
>>be forced to use his Japanese class names in your application.
> 
> 
> Certainly, yes. There is a trade-off: you can make it easier
> for some people to read and write code if they can use their
> native script; at the same time, it would be harder for others
> to read and modify it.
> 
> It's a policy decision whether you use English identifiers or
> not - it shouldn't be a technical decision (as it currently
> is).

See above: ASCII != English. Most scripts have a transliteration
into ASCII - simply because that's the global standard for
scripts.

>>I think source code encodings provide an ideal way to
>>have comments written in native scripts - and people
>>use that a lot. However, keeping the program code itself
>>in plain ASCII makes it far more readable and reusable
>>across locales. Something that's important in this
>>globalized world.
> 
> 
> Certainly. However, some programs don't need to live in
> a globalized world - e.g. if they are homework in a school.
> Within a locale, using native scripts would make the program
> more readable.

True.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 27 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::