xml processing and sys.setdefaultencoding (more info)

Martin v. Löwis martin at v.loewis.de
Mon Jul 21 01:32:20 EDT 2003


"christof hoeke" <csad7 at yahoo.com> writes:

> the original problem with the app was that the Pyana transformation
> complained about the string "xml" when it came over as unicode. so i used
> str(xml) but that gave the usual "ordinal not in range" error when the xslt
> contained e.g. german umlauts. 

At that point, you should have done

xml = xml.encode("utf-8")

where you might need to make sure that the string "utf-8" matches the
encoding= given in the xml header.

> i did not tried that before...  setting the default encoding to
> utf-8 fixed that. the reason is not entirely clear to me yet though.

For any Unicode object X, str(X) is equivalent to
X.encode(sys.getdefaultencoding()). Since that defaults to "ascii",
str(X) is normally the same as X.encode("ascii"), which fails if you
have non-ASCII in your string.

> it is my first "bigger" python project, so the code is not the best
> i guess and the version which does not work is still online. i need
> to put on the version with the changed default encoding.

I advise that you get rid of the need to set the default
encoding. Many users will have set this to a value different from
"utf-8".

Regards,
Martin





More information about the Python-list mailing list