[XML-SIG] XML and Unicode
Martin v. Loewis
Wed, 23 May 2001 22:01:50 +0200
> How does one detect the charset used in an XML document from a SAX2
> parser (PyXML 0.6.5)?
That is not supported in SAX. The underlying parser may expose this
information; but that is of course parser dependent.
> Also, if I have an XML document encoded ISO-8851-1 (and properly
> identified), should I have a reasonable expectation that the output
> of a SAX processor, post- .encode('utf-8'), should be correct if
> viewed in a Web browser with UTF-8 selected as a character encoding?
Not necessarily. If the document was a HTML document, and if it
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
line, then the browser has to decide whether it leaves the XML header
or the Content-Type. It would normally use the content type, which
would be incorrect.
If there is no incorrect character set information in the output
document, then a receiver should display it properly.
Of course, whether a Web browser can "correctly" display arbitrary XML
documents is a different question.