[Python-3000] Support for PEP 3131

Mon Jun 4 21:43:13 CEST 2007

Steve Howell <showell30 at yahoo.com> wrote:
> --- Michael Urman <murman at gmail.com> wrote:
> > 
> > The arguments then feel reduced to "Unicode enhances
> > readability" vs.
> > "Unicode impedes readability" and since clearly it
> > does both, how do
> > we make the value judgement about which it does
> > more? How do we weigh
> > the ability to use native language identifiers
> > against the risk that
> > there will be visually indistinguishable differences
> > introduced?
> > 
> 
> I think offering some Unicode examples will enhance
> the "Unicode enhances readability" argument.  Martin
> recently posted a small example program written in
> German.  As a German non-reader, I still found it
> pretty easy to read, with a little bit of effort. 
> Interestingly, the one word that I wasn't able to
> translate, even with the help of Babelfish, was the
> German word for "insert."  It turns out the thing that
> threw me off was that I omitted the umlaut.  That was
> a bit of an epiphany for me.

Maybe I'm worse with languages that other people are; it wouldn't
surprise me terribly.  I had some difficulty, primarily because I didn't
try to translate it (as such would be quite difficult with longer programs
and other languages).

Here is some code borrowed right from the Python standard library.  I've
gone ahead and mangled names in a consistant fashion using the tokenize
module.  Can you guess what it does?

class RTrCOlOrB :

    nBBjIUrB =0 

    def __init__ (self ,uX ,nBBjIUrB =1 ):
        self .uX =uX 
        self .nCIZj =[]# KAzWn ezWQ
        self .rBGBr =0 
        self .rInC =0 
        if nBBjIUrB :
            self .nBBjIUrB =1 
            self .nCIAC =self .uX .tell ()
            self .XznnCIZj =[]# KAzWn ezWQ

    def tell (self ):
        if self .rBGBr >0 :
            return self .rInCXzn 
        return self .uX .tell ()-self .nCIAC 

    def nBBj (self ,Xzn ,WDBQZB =0 ):
        DBAB =self .tell ()
        if WDBQZB :
            if WDBQZB ==1 :
                Xzn =Xzn +DBAB 
            elif WDBQZB ==2 :
                if self .rBGBr >0 :
                    Xzn =Xzn +self .rInCXzn 
                else :
                    raise Error ,"ZIQ'C TnB WDBQZB=2 yBC"
        if not 0 <=Xzn <=DBAB or self .rBGBr >0 and Xzn >self .rInCXzn :
            raise Error ,'UIe RTrCOlOrB.nBBj() ZIrr'
        self .uX .seek (Xzn +self .nCIAC )
        self .rBGBr =0 
        self .rInC =0 

> I hate to make a decision by majority rule, but I
> think there is the argument that you need to weigh the
> population of ascii-literate people vs.
> ascii-illiterate people. 

That's a very poor criteria, as not everyone in the world is a potential
programmer (despite what the BASIC folks tried to do). Further, of those
that become programmers in *any* substantial programming language today,
100% of them learn ascii. Even Java, which has been touted here as being
the premier language for allowing unicode identifiers (yes, a bit of
hyperbole), requires ascii to access the java libraries.  This will be
the case for the forseeable future in *any* programming language of
substantial use worldwide (regardless of what Python does regarding
unicode identifiers).

Since the PEP does not discuss the localization of every name in the
Python standard library (nor the builtins, __magic__ methods, etc.),
people are *still* going to need to learn the latin alphabet, at least
as much to distinguish and use Python keywords, builtins, and the
standard library.

With that said, the only question I believe that really matters in this
discussion is:
 * Where would you use unicode identifiers if they were available in
Python? Open source, closed source, personal projects?

Since everyone needs to learn ascii to use Python anyways; for the
ability to share, ascii will continue to dominate regardless of
potentially substantial closed source and personal project use.  This
has been seen (according to various reports available in this list) in
the Java world*.

As for closed source or personal projects, as long as we offer people
the ability to use unicode identifiers (since PEP 3131 is accepted, this
will happen), I don't see that there is any problem being conservative
in our choice of default. If we discover that ascii defaults are wrong,
we can always add unicode defaults later. The converse is not the case.

As I have stated before; offer people the ability to easily add
character sets that they want to see and allow to execute (I would be
happy to write an internationalizable interactive command-line and
wxPython interface for whatever method we choose), and those who want to
use non-ascii identifiers can do so.

 - Josiah

* There also seems to be a limited amount of information (available to
us) regarding how known Java unicode identifiers are.  We hear reports
from some that no one knows of unicode identifiers, but then we hear
about closed Java source using them in China and abroad, and BJörn
Lindqvist saying that unicode identifiers were mentioned in the two
Sweedish Java books he read.