[I18n-sig] Re: CJKCodecs 0.9 is released

Matt Gushee Matt Gushee <mgushee@havenrock.com>
Sat, 21 Jun 2003 16:51:07 -0600


Some may consider this off-topic, but I don't believe the right course
of action here can be decided on purely technical grounds. So here goes:

On Sat, Jun 21, 2003 at 11:16:22PM +0200, Martin v. Löwis wrote:
> 
> > This is a rediculously pedantic approach that will end up pissing
> > people off: the PUA in Unicode is designed for this purpose, so it
> > should be used.
> 
> It is fine if users are aware that this happens. If they are not, they
> will be pissed off when they find out.

Could be, if by "users" you mean developers that use the library. I
doubt that more than a minuscule fraction of end users has even heard of
Unicode. They just want working software and readable documents. And I
think has a lot to do with the success of Shift-JIS, even though it is
the epitome of bad design: at the time it was developed, half-width
katakana were in widespread use, and it Shift-JIS made it easy to
accommodate that need.

> > Where does it say you cannot cannot encode PUA characters in UTF-8? If
> > you have a custom font that handles these code points, then you are
> > going to be upset that you can't display them because the author of
> > the codec decided that PUA characters are an abomination that should
> > be striken from the earth.
> 
> And if you don't have such a font, you will see some replacement
> characters.

Well, I don't have an intimate knowledge of how CJKV character sets are
used on a daily basis, but I do have a broad knowledge of how society
works in at least Japan and mainland China (been to both, studied the
history in school, lived in Japan for seven years), and I would guess
that the availability of fonts in any given scenario is somewhat
analogous to the availability of XML DTDs: organizations (or
individuals) tend to have the same technology (fonts, software, etc.)
as other organizations that they are likely to exchange documents with.
That's not unique to Asia, of course, but I have the impression it is
more true there than in the West.

> Private characters should never leave the scope of "the application",
> and some effort should be done to make sure they don't leak out of
> "the application".

If by "application," you mean a particular software program or a closely
coordinated set of programs, I very much doubt that goal is achievable
in the foreseeable future. Maybe if you took a somewhat broader view and
said something like "system," encompassing both software and a set of
business practices, it would be realistic.

> There is nothing one can do, except to have users always declare their
> encodings properly, to use only data formats which include charset
> declarations, to use only charset names that are unambiguous,
> preferably even over time, etc. If people don't follow these rules,
> some things will go wrong. Then, people will learn to correct their
> errors.

No, rigid enforcement of standards is not the only choice. The
alternative is to determine what non-standard practices (or de-facto
standard practices) are most common, and attempt to accommodate those. I
honestly don't know which is better, but philosophically I favor
usability over correctness (of course, the two aren't necessarily at
odds in the long term, but often seem to conflict in the short term).

Adherence to standards is a good thing, but you also have to deal with
the social context where your product is being used. Consider the case
of, say, the typical harried IT manager in a Tokyo insurance firm. He
needs to plan the development of a new Web application; the project
requirements call for a very high-level dynamic language. Well, that
gives him several choices, doesn't it? And let's suppose that Python
requires his team to "always declare their encodings properly, to use
only charset names that are unambiguous ..." and so on. And suppose
one of the alternatives (I don't know, perhaps Ruby?) "just works" for
his use cases. Well, then, why should he use Python?

I'm not suggesting that the goal of standards-compliance be discarded
for the sake of popularity, now or ever. But sometimes you need to be a
little less forceful: give users something that works for them today,
while gently steering them toward the "right" path.

Python is good technology, and good technology should be widely used.
And if correctness comes at the expense of usability, you're just going
to drive people away.

-- 
Matt Gushee                 When a nation follows the Way,
Englewood, Colorado, USA    Horses bear manure through
mgushee@havenrock.com           its fields;
http://www.havenrock.com/   When a nation ignores the Way,
                            Horses bear soldiers through
                                its streets.
                                
                            --Lao Tzu (Peter Merel, trans.)