[XML-SIG] XML and Unicode
Wed, 23 May 2001 00:38:34 +0200
Mark Nottingham wrote:
> How does one detect the charset used in an XML document from a SAX2
> parser (PyXML 0.6.5)?
> Also, if I have an XML document encoded ISO-8851-1 (and properly
> identified), should I have a reasonable expectation that the output
> of a SAX processor, post- .encode('utf-8'), should be correct if
> viewed in a Web browser with UTF-8 selected as a character encoding?
This should work...
> In other words, is the post-parse unicode string a neutral
> representation of the 8851-x string, which can then be encoded as
Unicode is encoding neutral in the sense that it provides
space for the characters of most scripts. If the parser returns
Unicode, then you can encode it as UTF-8 and have the original
contents of the attribute/element represented as UTF-8 string.
> Or, is it in the charset of the original XML document (my
> testing seems to indicate the latter - what was a 8851 character in
> the original text does not successfully come out the other side)?
> (Sorry if this is obtuse - just getting into i18n, and Python docs
> are thin on the ground)
CEO eGenix.com Software GmbH
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/