[Python-3000] PEP 3131 accepted

Josiah Carlson jcarlson at uci.edu
Wed May 23 18:23:28 CEST 2007

"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> Josiah Carlson writes:
>  > From identical character glyph issues (which have been discussed
>  > off and on for at least a year),
> In my experience, this is not a show-stopping problem.

I never claimed that this, by itself, was a showstopper.

And my post should not be seen as a "these are all the problems that I
have seen with PEP 3131".  Those are merely the issues that have been
discussed over and over, for which I (and seemingly others) are still
concerned with, regardless of the hundreds of posts here and in
comp.lang.python seeking to convince us that "they are not a problem".

> Emacs/MULE has
> had it for 20 years because of the (horrible) design decision to
> attach charset information to each character in the representation of
> text.  Thus, MULE distinguishes between NO-BREAK SPACE and NO-BREAK
> SPACE (the same!) depending on whether the containing text "is" ISO
> 8859-15 or "is" ISO 8859-1.  (Semantically this is different from the
> identical glyph, different character problem, since according to ISO
> 8859 those characters are identical.  However, as a practical matter,
> the problem of detecting and dealing with the situation is the same as
> in MULE the character codes are different.)
> How does Emacs deal with this?  Simple.  We provide facilities to
> identify identical characters (not relevant to PEP 3131, probably), to
> highlight suspicious characters (proposed, not actually implemented
> AFAIK, since identification does what almost all users want), and to
> provide information on characters in the editing buffer.  The
> remaining problems with coding confusion are due to deficient
> implementation (mea maxima culpa).
> I consider this to be an editor/presentation problem, not a language
> definition issue.

This particular excuse pisses me off the most.  "If you can't
differentiate, then your font or editor sucks."  Thank you for passing
judgement on my choice of font or editor, but Ka-Ping already stated
why this argument is bullshit: there does not currently exist a font
where one *can* differentiate all the glyphs, and further, even if one
could visually differentiate similar glyphs, *remembering* the 64,000+
glyphs that are available in just the primary unicode plane to
differentiate them, is a herculean task.

Never mind the fact that people use dozens, perhaps hundreds of
different editors to write and maintain Python code, that the 'Emacs
works' argument is poor at best.  Heck, Thomas Bushnell made the same
argument when I spoke with him 2 1/2 years ago (though he also included
Vim as an alternative to Emacs); it smelled like bullshit then, and it
smells like bullshit now.

> Note that Ka-Ping's worry about the infinite extensibility of Unicode
> relative to any human being's capacity is technically not a problem.
> You simply have your editor substitute machine-generated identifiers
> for each identifier that contains characters outside of the user's
> preferred set (eg, using hex codes to restrict to ASCII), then review
> the code.  When you discover what an identifier's semantics are, you
> give it a mnemonic name according to the local style guide.
> Expensive, yes.  But cost is a management problem, not the kind of
> conceptual problem Ka-Ping claims is presented by multilingual
> identifiers.  Python is still, in this sense, a finitely generated
> language.

That's a bullshit argument, and you know it.  "Just use hex escapes"? 
Modulo unicode comments and strings, all Python programs are easily read
in default fonts available on every platform on the planet today.  But
with 3131, people accepting 3rd party code need to break 15+ years of
"what you see is what is actually there" by verifying the character
content of every identifier?  That's a silly and unnecessary workload
addition for anyone who wants to accept patches from 3rd parties, and
relies on the same "your tools suck" argument to invalidate concerns
over unicode glyph similarity.

Speaking of which, do you know of a fixed-width font that is able to
allow for the visual distinction of all unicode glyphs in the primary
plane, or even the portion that Martin is proposing we support?  This
also "is not a show-stopper", but it certainly reduces audience
satisfaction by a large margin.

>  > to editing issues (being that I write and maintain a Python editor)
> Multilingual editing (except for non-LTR scripts) is pretty much a
> solved problem, in theory, although adding it to any given
> implementation can be painful.  However, since there are many
> programmer's editors that can handle multilingual text already, that
> is not a strong argument against PEP 3131.

Another "your tools suck" argument.  While my editor has been able to
handle unicode content for a couple years now (supporting all encodings
available to Python), every editor that wants to properly support the
adding of unicode text in any locale will necessitate the creation of
charmap-like interfaces in basically every editor.

But really, I'm glad that Emacs works for you and has solved this
problem for you.  I honestly tried to use it 4 years ago, spent a couple
weeks with it.  But it didn't work for me, and I've spent the last 4
years writing an editor because it and the other 35 editors I tried at
the time didn't work for me (as have the dozens of others for the exact
same reason). But of course, our tools suck, and because we can't use
Emacs, we are already placed in a 2nd tier ghettoized part of the Python
community of "people with tools that suck".

Thank you for hitting home that unless people use Emacs, their tools
suck.  I still don't believe that my concerns have been addressed. And I
certainly don't believe that those Ka-Ping brought up (which are better
than mine) have been addressed.  But hey, my tools suck, so obviusly my
concerns regarding using my tools to edit Python in the future don't
matter.  Thank you for the vote of confidence.

 - Josiah

More information about the Python-3000 mailing list