[Python-Dev] forwarded message from Stephen J. Turnbull

Barry A. Warsaw barry@zope.com
Fri, 1 Mar 2002 14:02:29 -0500


--AkxAgBCuFJ
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


With permission from Stephen Turnbull, XEmacs' lead maintainer, I am
forwarding this response of his to my questions about XEmacs support
for the coding cookie.  Stephen asked me to include this preamble:

    So send it but please tag the forward with the gloss that the main
    content is (1) XEmacs will support the coding cookie for editing
    purposes, but (2) cookies can not be relied on in practice---it
    should be treated as a halfway house[1] for people who don't
    presently have the resources to convert to Unicode.

    Footnotes: 
    [1]  In the addiction rehabilitation sense.

I'm hoping that Stephen will soon be able to join the python-dev
discussions more directly, and I'm cc'ing him on this message.  I
admit to wearing the typical American sunglasses on this issue, MM2.1
not withstanding.  I think Stephen's view point and experience with
this issue is worth bringing up here.

-Barry


--AkxAgBCuFJ
Content-Type: message/rfc822
Content-Description: forwarded message
Content-Transfer-Encoding: 7bit

MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Return-Path: <steve@tleepslib.sk.tsukuba.ac.jp>
Delivered-To: barry@mail.wooz.org
Received: from digicool.com (unknown [63.100.190.15])
	by mail.wooz.org (Postfix) with ESMTP id 4CDA8D3829
	for <barry@wooz.org>; Wed, 27 Feb 2002 01:01:53 -0500 (EST)
Received: from <steve@tleepslib.sk.tsukuba.ac.jp>
  by digicool.com (CommuniGate Pro RULES 3.4)
  with RULES id 3619655; Wed, 27 Feb 2002 01:01:59 -0500
Received: from smtp.zope.com ([63.100.190.95] verified)
  by digicool.com (CommuniGate Pro SMTP 3.4)
  with ESMTP id 3619654 for barry@mail.zope.com; Wed, 27 Feb 2002 01:01:58 -0500
Received: from tleepslib.sk.tsukuba.ac.jp (mail@tleepslib.sk.tsukuba.ac.jp [130.158.98.109])
	by smtp.zope.com (8.11.6/8.11.2) with ESMTP id g1R61eZ02484
	for <barry@zope.com>; Wed, 27 Feb 2002 01:01:41 -0500
Received: from steve by tleepslib.sk.tsukuba.ac.jp with local (Exim 3.34 #1 (Debian))
	id 16fx92-0001gr-00; Wed, 27 Feb 2002 15:00:52 +0900
References: <15483.59915.551669.250815@anthem.wooz.org>
Organization: The XEmacs Project
In-Reply-To: <15483.59915.551669.250815@anthem.wooz.org>
Message-ID: <87lmdfmpkd.fsf@tleepslib.sk.tsukuba.ac.jp>
Lines: 109
User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Sender: "Stephen J. Turnbull" <steve@tleepslib.sk.tsukuba.ac.jp>
To: barry@zope.com (Barry A. Warsaw)
Subject: Re: forwarded message from M.-A. Lemburg
Date: 27 Feb 2002 15:00:50 +0900
X-Autogenerated: Mirror
X-Mirrored-by: <steve@tleepslib.sk.tsukuba.ac.jp>
X-MailScanner: Found to be clean

>>>>> "BAW" == Barry A Warsaw <barry@zope.com> writes:

    BAW> The proposal

    In Python 2.1, Unicode literals can only be written using the
    Latin-1 based encoding "unicode-escape". This makes the
    programming environment rather unfriendly to Python users who live
    and work in non-Latin-1 locales such as many of the Asian 
    countries. Programmers can write their 8-bit strings using the
    favourite encoding, but are bound to the "unicode-escape" encoding
    for Unicode literals.

Hey, he's talking about me!  But I have no problems with this.

    BAW> adopts the Emacs convention of specifying the coding system
    BAW> via -*- lines.  Apparently this works on Emacs but not XEmacs;

<RANT TYPE="I suppose you couldn't have known" DEGREE="blind rage">

This is evil and should be opposed.  It is a pure crock pandering to
lazy people, guaranteed to make work and cause data loss (probably
almost all minor, but no guarantee of that) for those who follow.
There is a perfectly good 30-year-old standard, ISO 2022, for these
things---that has been nearly 100% ignored.  There's a perfectly good
20-year-old implementation, X Compound Text.  They not only work for
these limited purposes, but they're multilingual besides.  Why do it
badly again?

Look, it took 10 years to get Mule into GNU Emacs.  And you know
something?  For 10 years rms was right; it was when he changed his
mind and said, "ok, put it in without Unicode support" that he erred.
Many applications (Pine, Ghostscript, perl, sed, ad nauseum) have had
separately maintained Japanese versions for longer.  They never get
merged to mainline because they don't comply with standards, because
they don't have to.

Do you really want that for Python-based apps?

</RANT>

XML gets this right.  If you are going to have multilingual trees,
then they should be coded in Unicode.  Python surely will grok UTF-8
in .py files.  The examples in the tutorial seem to indicate it does.
If so, just tell them "UTF-8, because I said so."

We regularly get bug reports because Japanese programmers
automatically put an 'euc-jp coding cookie in every file.  No-mule
XEmacs bitches about the undefined symbol, people with Mule but
without Japanese fonts get a "can't instantiate font" warning.
American programmers will put an ISO 8859/1 cookie in a file, and some
European will add the Euro sign without fixing the cookie.  People
will cut and paste incompatible 8859 coding systems into a file, and
change the cookie to match the most recent one---making a hash of
everything else.  Not to mention that Microsoft applications regularly
lie about what they're writing (or maybe I'm giving them too much
credit, I've seen mail that starts with a MS-Unicode BOM that
continues with ASCII HTML markup).

See also latin-unity.info, URL below.

As for the boneheads[1] at GNU, there are lots of things in GNU Emacs
I18N that shouldn't be.  Mule, for example.  But rms gave in to his
XE-nis envy, is the only way I can interpret it.  It's bad design, bad
code, bad documentation, and bad law.  (This doesn't apply to XEmacs;
Mule was put into XEmacs at the behest of Sun specifically to support
Japanese.)

    BAW> I use a MULE-ified 21.4.6 (haven't built an Emacs in a long
    BAW> time), and I cannot get it work in XEmacs.

Nevertheless, it is supported in XEmacs, and is working in 21.4.6.
Except ... only in the first line.  Yep, there's explicit code to
restrict recognition to the first line.  No second or third line, no
trailing local variables section.  I don't have a problem with
extending to the first few lines, but the trailing local variables
section I oppose.

You guys absolutely definitely positively only ever need _first and
second line_ forever and ever promise cross your heart, right?

    BAW> Can you lend any insight on where XEmacs is headed with this?

    (1) latin-unity package

http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/latin-unity-0.91-pkg.tar.gz

        Small and the docs are useful in this discussion, maybe.  It
        does not specifically address the coding cookie issue, but I
        could add support for coding cookies (I'd prefer to delete any
        I find ;-) but it would be an easy extension of the basic idea
        to update existing ones, even add them (but somebody will have
        to bribe me to get that in, what was that you were saying
        about $5000 checks that haven't quite cleared yet?).

    (2) Robust core Unicode support (being merged to the devel tree
        at this very moment according to CVS traffic)

    (3) Unicode internal coding, possibly an experimental option for
        the next major release. 


Footnotes: 
[1]  On this issue.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
              Don't ask how you can "do" free software business;
              ask what your business can "do for" free software.

--AkxAgBCuFJ--