[XML-SIG] Processing xml files with ISO 8859-1 chars

Thomas B. Passin tpassin@home.com
Wed, 7 Nov 2001 09:43:38 -0500


It seems that this xml file should caused an exception, since it is not
well-formed:  the actual encoding does not match the presumed encoding
(namely, utf-8).  The fact that the parse partially succeeded is disturbing.

I tried this example myself.  I am running pyxml 6.6 on Windows2000.  I did
get an exception, but it was from the pretty-printer, not the parser.
Adding an xml declaration declaring the actual iso-8859-1 encoding did in
fact allow the program to complete properly, as expected.

Why didn't the parser complain?

Cheers,

Tom P


[Rodrigo Senra]

>
>   I don't know if I stepped in a bug or it is just my newbieness ;o)
>   Trying to parse the file:
>
> ------------------- pau.xml -------------------
> <note>
>   <assunto>
>    This line is ok.
>    This line has characters  ISO-8859-1 with accents: Houve mudanças nos
> preços?
>    Linha ok.
>   </assunto>
> </note>
> ------------------ end of file pau.xml --------
>
> with the script:
>
> ------------------ file teste.py ----------------------------
> from xml.dom.ext.reader import Sax2
> from xml.dom.ext import PrettyPrint
>
> doc = Sax2.FromXmlStream(open('pau.xml'))
> PrettyPrint(doc,encoding='iso-8859-1')
> -------------------- end of teste.py script ------------
>
> produces:
>
> ----------- stdout trace -------------
> <?xml version='1.0' encoding='iso-8859-1'?>
> <!DOCTYPE note>
> <note>
>    <assunto>
>    This line is ok.
>
>    Linha ok.
>   </assunto>
> </note>
> ----------- end of trace -------------
>
> Am I doing something obviously wrong ? Should I try another parser ?