[XML-SIG] Re: Issues with Unicode type
Fredrik Lundh
fredrik@pythonware.com
Tue, 24 Sep 2002 16:34:16 +0200
fred wrote:
> I've just added a note to the docs for Python 2.2.2 and 2.3 that len()
> returns the number of storage units, not abstract characters.=20
imo (as the original author of the unicode type), that's an =
implementation
artifact, not a feature.
> I don't expect that to change given that it's been doing it that way =
since
> the Unicode type was introduced.
the original Unicode type used UCS-2 for internal storage, and all =
string
operations worked on code points.
adding UTF-16 support in a couple of places doesn't really change that;
an UTF-16-encoded unicode string should be treated just like an encoded
8-bit string -- standard string operations are not guaranteed to work on
encoded strings.
(if we document all bugs and half-baked solutions as supported features,
we will never be able to fix anything...)
</F>