[Python-3000] PEP 3131 accepted

Stephen J. Turnbull stephen at xemacs.org
Thu May 24 12:05:24 CEST 2007


Josiah Carlson writes:

 > Removing those words that some found offensive, perhaps I will get a
 > reponse to the point of my post: "your tools aren't very good" and
 > "Emacs does it right" are not valid responses to the concerns brought up
 > regarding unicode.

You're missing my point still, and I don't find the words offensive.
(It's a pain in the neck, since I already wrote my reply, but I'll
remove them too.)  Nor do I find your completely groundless conclusion
that I'm deprecating other tools offensive.

I find them to be an indicator of your fears which cannot be grounded
in any experience of mine---in exactly the kind of environment PEP
3131 will provide.  I strongly suspect you have no experience at all,
not even hearsay, to offer.  *Please* prove me wrong!  My experience
is *far* from definitive.

But if you can't, well, I don't blame you for your fear, but I also
cannot take it seriously as a reason to not implement this PEP in the
face of my own long experience.

 > but Ka-Ping already stated why this argument is invalid: there
 > does not currently exist a font where one *can* differentiate all
 > the glyphs,

I'll tell you why Ka-Ping's argument is a strawman.  First, one only
*needs* to be able to distinguish those characters that one can read.
It's nice to be able to admire the rest, of course, but you don't need
to see them as a speaker of that language would.  You just use a font
you like for the characters you can read, and the rest can be any old
dog.

Second, you do *not* need a single font with universal coverage.  I
typically use different fonts for Roman, Kanji, half-width kana, and
Hangul.  If I happen to have some Chinese in there, that will be yet
another font.  If I had cause to use Arabic, Hebrew, or Thai, they
would be yet other fonts.  It simply is not at all unpleasant to use
LucidaTypwriter for ASCII and Latin-1 in the same buffer with Sazanami
Gothic for Japanese.

N.B.  Martin is correct to point out the existing of the SIL BMP
fallback font, but that doesn't answer the real issue, that users
should use the fonts (and tools) they like best.

 > and further, even if one could visually differentiate similar

I have actually worked in an environment where you can't visually
distinguish different characters.  Security aside, it's a PITA, and
you *do* want tools to deal with it.  Those tools are *not* expensive;
simply audit the editor buffer for characters outside of the user's
acceptable set, and be 99% happy.  Once you've got tools, it's not a
big deal.  Can you find somebody with experience to say otherwise?

 > glyphs, *remembering* the 64,000+ glyphs that are available in just
 > the primary unicode plane to differentiate them, is a herculean
 > task.

Strawman.  The only people who need to remember the glyphs are those
who need to read them anyway, or glyphs that look like them (cf
Ka-Ping's example).  So they have already memorized them.

 > Never mind the fact that people use dozens, perhaps hundreds of
 > different editors to write and maintain Python code, that the
 > 'Emacs works' argument is poor at best.  it was invalid then, and
 > it was invalid now.

It was intended only to counter Ka-Ping's strawman of "impossible to
detect", and it demolishes that claim.

But addressing the content of what you write, you mean that, in a
world that allows multilingual identifiers, 'Emacs works' "smells
like" [from your original post] a threat to the market share of
editors that can't deal with multilingual identifiers, not to mention
the work habits of Emacs-haters everywhere, don't you?

Well, you're probably wrong.  *If* your users need to deal with
multilingual identifiers, *maybe* they'll prefer to switch to Emacs.
*If* they need extremely robust handling of multilingual identifiers
on a daily basis, they probably will switch to Emacs.

I doubt it, though.  What they'll probably do is write a five line
patch to get them 90% of the way to what Emacs gives them out of the
box, and be ecstatic that they don't have to use Emacs at all.
(That's a guess, as an XEmacs developer I don't see much of that
activity.)

And that's a big "if".  Most of your users will not see code in a
language the current version of your editor can't deal with in their
working lives, and 90% won't in the usable life of your product.  That
I can tell you from experience.  Emacs has all these wonderful
multilingual features, but you know what?  95% of our users are
monoscript 100% of the time.[1]  90% of the rest use their primary
script 95% of the time.  Emacs being multilingual only means that the
one language might be Japanese or Thai.  If 99% of your users
currently use only ISO-8859-15, that isn't going to change by much just
because Python now allows Thai identifiers.

In other works, if you're up multilingual creek without a paddle,
Emacs will get you to shore.  Do you have a problem with it, put that
way?

 > That's a invalid argument, and you know it.  "Just use hex
 > escapes"?

No, my argument is not "just use hex escapes".  Please read it again,
and if you wish to respond to what I wrote, feel free.

So, you have my apologies, but I still advocate implementation of PEP
3131 over your objections, and those of Ka-Ping.

Footnotes: 
[1]  Eg, all Swiss know a half-dozen languages, but they can write all
of them with one script, ISO-8859-15.



More information about the Python-3000 mailing list