[XML-SIG] Re: Issues with Unicode type

Eric van der Vlist vdv@dyomedea.com
23 Sep 2002 22:34:17 +0200


On Mon, 2002-09-23 at 21:29, Tom Emerson wrote:

> Sure, but the *implementation* within the Python interpreter is
> treating characters in the astral planes as two 16-bit words, not
> one. The len() value that you get is the number of UTF-16-encoded
> words in the string. There was a very long, very drawn out discussion
> on the representation of Unicode characters in Python a while back on
> the python-i18n mailing list where this whole thing was beaten to
> death and which eventually lead to the option to compile the
> interpreter to use a 32-bit character representation.

Having gone through this thread in the archives, I don't want to open it
again :-)... OTH, would it really be an option to say that feature X or
Y of PyXML (if such a library was added at some point) would require an
interpreter compiled for 32-bit character representation to be
compliant? Assumining that all the common distributions are shiped
compiled for 16-bit (like the Debian sid on which I am doing these
tests), it would become a real nightmare for the users!

Eric
--=20
Rendez-vous =E0 Paris.
                          http://www.technoforum.fr/integ2002/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------