not quite 1252

Serge Orlov Serge.Orlov at gmail.com
Wed Apr 26 18:52:34 EDT 2006


Anton Vredegoor wrote:
> I'm trying to import text from an open office document (save as .sxw and
>   read the data from content.xml inside the sxw-archive using
> elementtree and such tools).
>
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfect because there are still characters
> in it like \93 or \94. Has anyone handled this before? I'd rather not
> reinvent the wheel and start translating strings 'by hand'.

I extracted content.xml from a test file and the header is:
<?xml version="1.0" encoding="UTF-8"?>

So any xml library should handle it just fine, without you trying to
guess the encoding.




More information about the Python-list mailing list