[Python-Dev] Divorcing str and unicode (no more implicitconversions).

Wed Oct 26 19:32:52 CEST 2005

M.-A. Lemburg wrote:
> You even argued against having non-ASCII identifiers:
> 
> http://mail.python.org/pipermail/python-list/2002-May/102936.html

I see :-) It seems I have changed my mind since then (which
apparently predates PEP 263).

One issue I apparently was worried about was the plan to use
native-encoding byte strings for the identifiers; this I didn't
like at all.

> * Unicode identifiers are going to introduce massive
> code breakage - just think of all the tools people use
> to manipulate Python code today; I'm quite sure that
> most of it will fail in one way or another if you present
> it Unicode literals such as in "zähler += 1".

True. Today, I think I would be willing to accept the
code breakage: these tools had quite some time to update
themselves to PEP 263 (even though not all of them have
done so yet); also, usage of the feature would only spread
gradually. A failure to support the feature in the Python
proper would be treated as a bug by us; how tool providers
deal with the feature would be their choice.

> * People don't seem very interested in using Unicode
> identifiers, e.g.
> 
>   http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html

True. However, I also suspect that lack of tool support
contributes to that. For the specific case of Java,
there is no notion of source encoding, which makes Unicode
identifiers really tedious to use.

If it were really easy to use, I assume people would actually
use it - atleast in some of the contexts, like teaching,
where Python is also widely used.

> Do you really think that it will help with code readability
> if programmers are allowed to use native scripts for their
> identifiers ?

Yes, I do - for some groups of users. Of course, code sharing
would be more difficult, and there certainly should be a policy
to use only ASCII in the standard library. But within local
groups, users would find understanding code easier if they
knew what the identifiers actually meant.

> If you are told to debug a program
> written by say a Japanese programmer using Japanese identifiers
> you are going to have a really hard time. Integrating such
> code into other applications will be even harder, since you'd
> be forced to use his Japanese class names in your application.

Certainly, yes. There is a trade-off: you can make it easier
for some people to read and write code if they can use their
native script; at the same time, it would be harder for others
to read and modify it.

It's a policy decision whether you use English identifiers or
not - it shouldn't be a technical decision (as it currently
is).

> I think source code encodings provide an ideal way to
> have comments written in native scripts - and people
> use that a lot. However, keeping the program code itself
> in plain ASCII makes it far more readable and reusable
> across locales. Something that's important in this
> globalized world.

Certainly. However, some programs don't need to live in
a globalized world - e.g. if they are homework in a school.
Within a locale, using native scripts would make the program
more readable.

Regards,
Martin