[XML-SIG] Re: Parsing a unicode string
Andrew Clover
and-xml at doxdesk.com
Sun Oct 10 19:52:07 CEST 2004
> It really only makes sense to describe XML parsing in terms of byte
> streams.
Certainly this has traditionally been the case.
In DOM Level 3 LS, however, LSInput can now specify a character input
source (characterStream or stringData properties) in which no attempt is
made to do byte-to-character decoding.
There was a bit of a kerfuffle over what inputEncoding such Documents
should report; 'utf-16' was decided on as this is DOM's native string
type. Unfortunately this doesn't quite hang with Python where a
DOM-acceptable string might be narrow or, in the case where Python is
compiled with wide chars, 32 bits long. (pxdom plumps for reporting
'utf-8' and 'utf-32' in these cases, but it's not really clear-cut.)
Anyway as a consequence pxdom can indeed accept Unicode strings to
parseString, but this can't be relied upon for other implementations,
especially DOM Level 2 ones.
--
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the XML-SIG
mailing list