[XML-SIG] Processing xml files with ISO 8859-1 chars
Thomas B. Passin
tpassin@home.com
Wed, 7 Nov 2001 09:43:38 -0500
It seems that this xml file should caused an exception, since it is not
well-formed: the actual encoding does not match the presumed encoding
(namely, utf-8). The fact that the parse partially succeeded is disturbing.
I tried this example myself. I am running pyxml 6.6 on Windows2000. I did
get an exception, but it was from the pretty-printer, not the parser.
Adding an xml declaration declaring the actual iso-8859-1 encoding did in
fact allow the program to complete properly, as expected.
Why didn't the parser complain?
Cheers,
Tom P
[Rodrigo Senra]
>
> I don't know if I stepped in a bug or it is just my newbieness ;o)
> Trying to parse the file:
>
> ------------------- pau.xml -------------------
> <note>
> <assunto>
> This line is ok.
> This line has characters ISO-8859-1 with accents: Houve mudanças nos
> preços?
> Linha ok.
> </assunto>
> </note>
> ------------------ end of file pau.xml --------
>
> with the script:
>
> ------------------ file teste.py ----------------------------
> from xml.dom.ext.reader import Sax2
> from xml.dom.ext import PrettyPrint
>
> doc = Sax2.FromXmlStream(open('pau.xml'))
> PrettyPrint(doc,encoding='iso-8859-1')
> -------------------- end of teste.py script ------------
>
> produces:
>
> ----------- stdout trace -------------
> <?xml version='1.0' encoding='iso-8859-1'?>
> <!DOCTYPE note>
> <note>
> <assunto>
> This line is ok.
>
> Linha ok.
> </assunto>
> </note>
> ----------- end of trace -------------
>
> Am I doing something obviously wrong ? Should I try another parser ?