[XML-SIG] Re: Issues with Unicode type
Uche Ogbuji
uche.ogbuji@fourthought.com
Tue, 24 Sep 2002 17:52:21 -0600
>
> Martin v. Loewis writes:
> > 3. Implement it properly. Please understand that you will be trading
> > efficiency for correctness.
>
> I'm sure a small C extension could provide the needed helpers quite
> efficiently. Even with a UCS-4 version of Python, a Unicode literal
> containing a surrogate pair (explicitly, using two \u sequences) will
> exhibit the behavior that Eric wants to see suppressed.
Yes. That was what I figured to in my recent rumination on such literals. My
conclusion was *never* to use "naked" surrogate pairs in Unicode literals,
even with UTF-16 Python. I get the sense this is a "best practice" that
should be clearly articulated:
Do *not* express Unicode literals using direct UTF-16 surrogate pairs, e.g.
u"\uD800\uDC00". *Always* use the high-order unicode literal character form
(big-U notation), e.g. u"\U00010000".
Unless someone weighs in with reasoning against this, I'll plan to add
something to this effect to the Akara.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/webservices/library/ws-pyth10.html