[XML-SIG] Re: Issues with Unicode type
Uche Ogbuji
uche.ogbuji@fourthought.com
Mon, 23 Sep 2002 14:42:51 -0600
> > On Mon, 2002-09-23 at 21:27, Uche Ogbuji wrote:
> Having said all this, Martin is right about XML and the BMP. I'd forgotten.
See, I knew I'd make a silly of myself before this thread went very long.
I wasn't even properly reading what I was quoting from the XML spec:
> Character Range
> [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks,
> FFFE, and FFFF. */
>
> """
>
> So 𐠀 is not WF XML. I'm not sure why JJC uses it.
So I was wrong and 𐠀 is indeed WF, and the problem remains that XML
processing code will have to augment Python built-ins such as len with
intelligence about surrogates :-(
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.
html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/w
ebservices/library/ws-pyth10.html