Parsing XML with ElementTree (unicode problem?)

oren.tsur at gmail.com oren.tsur at gmail.com
Tue Jul 24 07:57:26 CEST 2007


On Jul 23, 4:46 pm, "Richard Brodie" <R.Bro... at rl.ac.uk> wrote:
> <oren.t... at gmail.com> wrote in message
>
> news:1185200976.082516.105420 at 57g2000hsv.googlegroups.com...
>
> > so what's the difference? how comes parsing is fine
> > in the first case but erroneous in the second case?
>
> You may have guessed the encoding wrong. It probably
> wasn't utf-8 to start with but iso8859-1 or similar.
> What actual byte value is in the file?

I tried it with different encodings and it didn't work. Anyways, I
would expect it to be utf-8 since the XML response to the amazon query
indicates a utf-8 (check it with
http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&AWSAccessKeyId=189P5TE3VP7N9MN0G302&Operation=ItemLookup&ItemId=1400079179&ResponseGroup=Reviews&ReviewPage=166

 in your browser, the first line in the source is <?xml version="1.0"
encoding="UTF-8"?>)

but the thing is that the parser parses it all right from the web (the
amazon response) but fails to parse the locally saved file.

> > 2. there is another problem that might be similar I get a similar
> > error if the content of the (locally saved) xml have special
> > characters such as '&'
>
> Either the originator of the XML has messed up, or whatever
> you have done to save a local copy has mangled it.

I think i made a mess. I changed the '&' in the original response to
'and' because the parser failed to parse the '&' (in the locally saved
file) just like it failed with the French characters. Again, parsing
the original response was just fine.

Thanks again,

Oren





More information about the Python-list mailing list