writing Unicode objects to XML

Alessio Pace puccio_13 at yahoo.it
Tue May 6 03:52:14 EDT 2003


Martin v. Löwis wrote:

> Alessio Pace <puccio_13 at yahoo.it> writes:
> 
>> I mean, they are strange characters, but the
>> weird thing is that parsing over again the file.xml I get the same
>> Unicode objects as when I read it for the first time with characters
>> references as content.
> 
> Why do you think they are strange characters? Because your editor
> tells you? Don't trust your editor, then - use an editor instead that
> properly understands XML encodings, or, atleast, can be configured to
> use encodings different from your locale's encoding.


Well, I use VIM, and maybe that's the issue. Anyway, the weird character
when I print them on the console after python XML parsing they are exaclty
what I expect (encoded in my default system 'iso-8859-1'), so I see an
accented e for instance

> 
> Or did you perhaps try printing it to the console? Same issue: You
> need to get a different console, one that uses the encoding of the XML
> file, not some encoding that the vendor of the console software though
> might be a good idea to use. That particular vendor may be interested
> in compatibility with a system developed 20 years ago, whereas you
> are using an encoding that is just 10 years old.
> 
> Regards,
> Martin

Well, I read the discussion above about XML, I still can't understand why my
XML document does *not* get parsed correctly if instead of characters
references I write UTF-8 byte sequences (I get them double escaped as I
said previously in the thread).
Anyway, if the "original" XML contains only characters references, then
reading and writing do all that is expected, and continue to work.

Thanks again to everybody.

-- 
bye
Alessio Pace




More information about the Python-list mailing list