[Python-Dev] Unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 16 May 2000 20:43:34 +0200

> it is real.  I won't repeat the arguments one more time; please read
> the W3C character model note and the python-dev archives, and read
> up on the unicode support in Tcl and Perl.

I did read all that, so there really is no point in repeating the
arguments - yet I'm still not convinced. One of the causes may be that
all your commentary either

- discusses an alternative solution to the existing one, merely
  pointing out the difference, without any strong selling point
- explains small examples that work counter-intuitively

I'd like to know whether you have an example of a real-world
big-application problem that could not be conveniently implemented
using the new Unicode API. For all the examples I can think where
Unicode would matter (XML processing, CORBA wstring mapping,
internationalized messages and GUIs), it would work just fine.

So while it may not be perfect, I think it is good enough. Perhaps my
problem is that I'm not a perfectionist :-)

However, one remark from http://www.w3.org/TR/charmod/ reminded me of
an earlier proposal by Bill Janssen. The Character Model says

# Because encoded text cannot be interpreted and processed without
# knowing the encoding, it is vitally important that the character
# encoding is known at all times and places where text is exchanged or
# stored.

While they were considering document encodings, I think this applies
in general. Bill Janssen's proposal was that each (narrow) string
should have an attribute .encoding. If set, you'll know what encoding
a string has. If not set, it is a byte string, subject to the default
encoding. I'd still like to see that as a feature in Python.