Newbie question: Unicode hiccup on reading file i just wrote
Diez B. Roggisch
deets at nospam.web.de
Mon Jan 30 23:33:25 CET 2006
> hi all, i have a newbie problem arising from writing-then-reading a
> unicode file, and i can't work out what syntax i need to read it in.
> the syntax i'm using now (just using quick hack tmp files):
> print ss
> ## rrr=xml.dom.minidom.parseString(f.read()) # originally
> barfs with this error:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in
> position 5092: ordinal not in range(128)
> any ideas?
Your doing things triple-time, which is this time not even half as good:
gives you a file that will return unicode objects when reading. And
will wrap a normal, non-encoding-aware file to become an encoding aware
one. The result is that reading reading from the former already yields a
unicode object that is passed to the second wrapper. It will silently
pass the unicode-object - but it's useless.
And then you try and pass that unicode object of yours to the minidom.
But guess what, the minicom parser expects a (byte) string, as it reads
the mandatory xml encoding header and will decode the contents itself.
So, the passed unicode object is converted to a string beforehand,
yielding the exception you see.
Just don't do any fancy encoding stuff at all, a simple
More information about the Python-list