[I18n-sig] Python and Unicode == Britain and the Euro?

Paul Prescod paulp@ActiveState.com
Sat, 10 Feb 2001 11:17:19 -0800

Andy, I think that part of the reason that Westerners push harder for
Unicode than Japanese is because we are pressured (rightly) to right
software that works world-wide and it is simply not sane to try to do
that by supporting multiple character sets. Multiple encodings maybe.
Multiple character sets? Forget it.

I don't know of any commercial software written in Japan but used in the
west so I think that they probably have less I18N pressure than we do.
Unicode is only interesting when you want the same software to run in
multiple character set environments!

Andy Robinson wrote:
> ...
> 2. I have been told that there are angry mumblings on the
> Python-Japan mailing list that such a change would break all
> their existing Python programs; I'm trying to set up my tools to
> ask out loud in that forum.

I don't think it is posssible to say in the abstract that a move to
Unicode would break code. Depending on implementation strategy it might.
But I can't imagine there is really a ton of code that would break
merely from widening the character.

> 3. Ruby was designed in Japan and that's where most of its users are.
> They have a few conversion functions and seem perfectly happy.

Don't know enough to comment except to point out that Ruby has a command
line option to set the character set to Kanji.

> 4. Visual Basic running under Windows 2000 with every international
> option I can find will accept unicode characters in string literals
> but will not accept characters outside of ISO-Latin-1 in

The VB in Visual Studio 7 will happily accept wide characters (e.g.
U+652B: CJK Unified Ideograph) on Windows 2000. Of course you need to
set your font to have the right character.

Compared to where we were a few years ago (better install DOS-J!) this
is a real miracle. Of course Unix systems will move over more slowly
(grumble..). Nevertheless its coming:


> I'm going to try to think up a questionnaire. If anyone can suggest
> other domain experts, or mailing lists of user groups in other
> language
> zones, I will be happy to try and pursue them and get some real hard
> data.

I like your list but I don't know that there is really a reasonable
question we can ask. 

What does it mean for Python's "standard string type" to be "Unicode?"
If you ask the question as: "Should Python's standard string type
support ordinal values beyond 255?", who would say no? If you say:
"Should Python standardize on the Unicode character set" you might get
different answers. As you yourself point out, it depends on whether that
means that you would LOSE the ability to do string-like things on

 Paul Prescod