[I18n-sig] Python and Unicode == Britain and the Euro?

Andy Robinson andy@reportlab.com
Sat, 10 Feb 2001 16:13:37 -0000


This reminds me a lot of another debating going on close to home :-)

- people who are in favour assume everyone else is, and that the only
question is how to get there
- people who are against are just plain worried but can't say why
- the government stays very quiet and avoids asking for a referendum

I want to re-ask the big question:  is it desirable that the
standard string type should become a Unicode string one day?

To my knowledge, all the pressure for making Unicode strings
the fundamental data type comes from Americans and Westerm
Europeans who think they are doing the right thing. This is
far from proven.  Please consider these points:

1. To my knowledge we have not seen posts from anyone outside the
ISO-Latin-1 zone in this thread.

2. I have been told that there are angry mumblings on the
Python-Japan mailing list that such a change would break all
their existing Python programs; I'm trying to set up my tools to
ask out loud in that forum.

3. Ruby was designed in Japan and that's where most of its users are.
They have a few conversion functions and seem perfectly happy.

4. Visual Basic running under Windows 2000 with every international
option I can find will accept unicode characters in string literals
but will not accept characters outside of ISO-Latin-1 in

5. All the Japanese-written code I have seen (not much of it
is in Python, lots in Java and VB) either uses english variable
names or the romanized japanese ones ('total'='gokei').  No
one I know of has complained about this limitation.

I do NOT want to kill off this discussion, which is producing
an interesting proposal and I am in favour of many point it
raises.   However, I think we should make a real effort to
see what the market actually wants and if the implied goal
is right.  It would be tragic to break old code one day
for improvements nobody cares about  - or, worse, to
alienate exactly the people we are trying to cater for.

Now, who can we ask outide our own community who could
have insights into this?  My shortlist so far is:

- Frank Chen (wrote our Chinese and Korean Codecs)
- Tamito Kajiyama (wrote our Japanese Codecs)
- Ruby mailing lists
- Python Japan Mailing List
- Basis Technologies (Tom, are you there?)
- Digital Garage and recent escapees (Brian?)
- the CTO of a Kuwaiti bank I know
- Ken Lunde (author of that CJKV book)
- Tony Graham (author, of Unicode: A Primer and a member of the
Unicode consortium)
- James Clark (XML fame, lives in Thailand)

I'm going to try to think up a questionnaire. If anyone can suggest
other domain experts, or mailing lists of user groups in other
language
zones, I will be happy to try and pursue them and get some real hard
data.

Best Regards,

Andy Robinson