[XML-SIG] XML Unicode and UTF-8
n.youngman at ntlworld.com
Sat Aug 7 08:48:18 CEST 2004
On Thursday 05 Aug 2004 9:27 pm, Mike Brown wrote:
> Paul Boddie wrote:
> > Do this instead:
> > utext = segment.decode( segment )
> The resulting Unicode object may contain characters which are not allowed
> in XML, and thus the text may not be serializable (at least not in a way
> that would produce well-formed XML).
Yes, but it's being written out through a UTF-8 codec to a file which
specifies 'charset="utf-8"'. AIUI the python UTF-8 codec can detect that it's
got a unicode string and convert it to utf-8 with no futher programmer
Of course a week ago, Python was just another buzzword to me, so I could be
> To embed arbitrary bytes in XML, the usual advice is to first convert the
> bytes into a character sequence that is permitted in XML. Base64 is a
> popular and easily implemented option, albeit inefficient. The article at
> http://www.javaworld.com/javaworld/javatips/jw-javatip117-p2.html suggests
> that a custom Huffman implementation is nearly 1:1. I've mapped bytes into
> the Private Use Area of Unicode before, too, although that's definitely not
All neat ideas, but as I want UTF-8 encoding, they would just add an
unnecessary layer of complexity.
Thanks for trying to help, but I think I've got what I need.
More information about the XML-SIG