[XML-SIG] Re: Issues with Unicode type

Fredrik Lundh fredrik@pythonware.com
Tue, 24 Sep 2002 16:34:16 +0200


fred wrote:

> I've just added a note to the docs for Python 2.2.2 and 2.3 that len()
> returns the number of storage units, not abstract characters.=20

imo (as the original author of the unicode type), that's an =
implementation
artifact, not a feature.

> I don't expect that to change given that it's been doing it that way =
since
> the Unicode type was introduced.

the original Unicode type used UCS-2 for internal storage, and all =
string
operations worked on code points.

adding UTF-16 support in a couple of places doesn't really change that;
an UTF-16-encoded unicode string should be treated just like an encoded
8-bit string -- standard string operations are not guaranteed to work on
encoded strings.

(if we document all bugs and half-baked solutions as supported features,
we will never be able to fix anything...)

</F>