[Python-ideas] Visually confusable unicode characters in identifiers

Guido van Rossum guido at python.org
Mon Oct 1 19:44:42 CEST 2012


On Mon, Oct 1, 2012 at 10:02 AM, Mathias Panzenböck
<grosser.meister.morti at gmx.net> wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>
>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>>
>>> I still don't understand why unicode characters are allowed at all in
>>> identifier
>>> names. Is the reason for this written down somewhere?
>>
>>
>> http://www.python.org/dev/peps/pep-3131/#rationale
>>
>
> But the Python keywords and more importantly the documentation is English.
> Don't you need to be able to speak/write English in order to code Python
> anyway? And if you keep you code+comments English you can access a much
> larger developer pool (all developers who speak English should by my
> hypothesis be a superset of all developers who speak a certain language).

Hi Matthias,

Your objections go pretty much exactly along the lines of my original
resistance to this proposal (which was proposed many times before it
got to be a PEP). What finally made me change my mind was talking to
educators who were teaching Python in countries where not only English
is not the primary language, the primary language is not even related
to English. (E.g. Chinese or Japanese.)

Teaching the students the necessary language keywords and standard
library names is not that difficult; even if English *is* your primary
language you have to learn what they mean in the context of
programming. (Example: "print" comes from a very ancient mode of using
computers where the only form of output was through a physical
printer.)

But these students often have a very limited English vocabulary, and
their science and math classes (which are often useful starting points
for programming exercises) are usually taught in the native language.
So when teachers show students example programs it helps if they can
name e.g. their variables and functions in the native language.
Comments are also often written in the native language. Here, it
really helps if the students can type their native language directly
rather than having to use the Latin transcription (even if they often
also have to learn the latter, for unrelated pragmatic reasons).

>From your name and email it sounds like your native language might be
German. Like me, you probably take pride in your English skills and
like me, you write all your code using English for identifiers and
comments. However, for students just learning to program and not yet
well-versed in English, that would be like trying to teach them
multiple things at once. It may work for the smartest students, but it
probably would be unnecessarily off-putting for many others.

As an example in German, I found a Python book aimed at middle- and
high-schoolers written in German, Python für Kids. You can look inside
it on the Amazon website:
http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514
-- the examples use German words for most module and variable names.
Luckily German limited to ASCII is still fairly readable ("fuer"
instead of "für" etc.), so Unicode is not strictly needed for this
case -- but you can understand that in languages whose native alphabet
is not English, Unicode is essential for the same style of
introduction.

I'm sure there are also examples beyond education -- e.g. in a program
for calculating dutch taxes I would use the dutch names for the
various technical terms naming concepts in dutch tax law, and again,
in the case of the Dutch language that doesn't require Unicode, but
for many other languages it would.

I hope this helps. (Also note, as the PEP states explicitly, that the
Python standard library should use only ASCII and English for
identifiers and comments, except in those unittests that are
specifically testing the Unicode identifiers feature.)

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list