[Python-Dev] Re: PEP: Defining Unicode Literal Encodings (revision 1.1)

M.-A. Lemburg mal@lemburg.com
Sat, 14 Jul 2001 20:52:21 +0200

Roman Suzi wrote:
> On Sat, 14 Jul 2001, M.-A. Lemburg wrote:
> >> #!/usr/bin/python
> >> # -*- coding=utf-8 -*-
> >> ...
> >
> >I already mentioned allowing directives in comments to work around
> >the problem of directive placement before the first doc-string.
> >
> >The above would then look like this:
> >
> >#!/usr/local/bin/python
> ># directive unicodeencoding='utf-8'
> >u""" UTF-8 doc-string """
> >
> >The downside of this is that parsing comments breaks the current
> >tokenizing scheme in Python: the tokenizer removes comments before
> >passing the tokens to the compiler ...wouldn't be hard to
> >fix though ;-) (note that tokenize.py does not)
> BTW, it is possible to write variable names in national alphabet
> is locale is set. But I do not know if this is side-effect
> which will be corrected or behaviour one can rely on ;-)

It is a side-effect of Python relying on the isalpha() C API.
I wouldn't count on it though since it is not compliant to 
the Python reference and other Python implementations may
very well not offer this possibility.
> It could be also nice to be able replace keywords with localised ones.
> Python remains nice even after translating into Russian.

Eek, no please !  VisualBasic went down that road and
backed out again... it's simply a complete nightmare.

> This + mending broken IDLE (which doesn't allow to enter cyrillic) will
> allow beginners to think and write. Currently "writing while thinking"
> works only for those who think in English ;-)
> And such a move opens Python to secondary schools. For example, Logo has
> national variants without any losses. Why Python, also targeted for
> education requires to use English?
> And unicoding (utf-8-ing) Python source could be the solution.
> What do you think?

I personally think that programs should always be written in ASCII 
and all national language string literals be moved out into
gettext() (or similar) support files. Of course, for beginners
and small projects this is overkill, so the proposed Unicode literal 
variant might help... which is why I wrote the PEP -- adding this
support to Python is really simple and does not require a 
major rewrite of the tokenizer/compiler components in Python.

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/