[Python-Dev] Pre-PEP: Python Character Model

Neil Hodgson nhodgson@bigpond.net.au
Wed, 7 Feb 2001 22:44:36 +1100


[Paul Prescod discusses Unicode enhancements to Python]

   Another approach being pursued, mostly in Japan, is Multilingualization
(M17N),
http://www.m17n.org/
   This is supported by the appropriate government department (MITI) and is
being worked on in some open source projects, most notably Ruby. For some
messages from Yukihiro Matsumoto search deja for M17N in comp.lang.ruby.

   Matz: "We don't believe there can be any single characer-encoding that
encompasses all the world's languages.  We want to handle multiple encodings
at the same time (if you want to)."

   The approach taken in the next version of Ruby is for all string and
regex objects to have an encoding attribute and for there to be
infrastructure to handle operations that combine encodings.

   One of the things that is needed in a project that tries to fulfill the
needs of large character set users is to have some of those users involved
in the process. When I first saw proposals to use Unicode in products at
Reuters back in 1994, it looked to me (and the proposal originators) as if
it could do everything anyone ever needed. It was only after strenuous and
persistant argument from the Japanese and Hong Kong offices that it became
apparent that Unicode just wasn't enough. A partial solution then was to
include language IDs encoded in the Private Use Area. This was still being
discussed when I left but while it went some way to satisfying needs, there
was still some unhappiness.

   If Python could cooperate with Ruby here, then not only could code be
shared but Python would gain access to developers with large character set
/needs/ and experience.

   Neil