[XML-SIG] Handling of character entity references

Martin v. Löwis martin@v.loewis.de
25 May 2003 19:55:26 +0200


pyxml@wonderclown.com writes:

> This brings in the XHTML Latin-1 entities, which seems to work well
> enough to get the parser to accept the source, but then é gets
> translated to the following two-byte sequence on output: 0xC3
> 0xA9. 

[I had to think of what the problem might be]

You mean, you get this byte sequence in the pretty-printed XML?

This is just fine, and expected. 0xc3,0xa9 *is* the byte sequence that
represents é, atleast in UTF-8, and UTF-8 is the default
encoding of XML. Unless there is a problem with that, I suggest you
accept the output as-is - it *is* the document you meant to produce.

Regards,
Martin