error when parsing xml
Diez B. Roggisch
deets at nospam.web.de
Mon Sep 5 09:03:35 EDT 2005
Odd-R. wrote:
> This is retrieved through a webservice and stored in a variable test
>
> <?xml version='1.0' encoding='utf-8'?>
> <!-- DTD for xmltest-->
> <!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
> <!ELEMENT test (#PCDATA)>]>
> <testtest><test>æøå</test></testtest>
>
> printing this out yields no problems, so the trouble seems to be when executing
> the following:
>
> doc = minidom.parseString(test)
You need to do
doc = minidom.parseString(test.encode("utf-8"))
The reason is simple: test is not a string, but a unicode object.
XML-Parsers work with strings - thus passing a unicode object to them
will convert it - with the default encoding, which is ascii. BTW, I used
encode("utf-8") because the header of your documnet says so. If it
were latin1, you'd need that. There is plenty of unicode-related
material out there - use google to search this NG or the web.
Diez
More information about the Python-list
mailing list