Newbie question: Unicode hiccup on reading file i just wrote

Diez B. Roggisch deets at
Mon Jan 30 23:33:25 CET 2006

Darcy schrieb:
> hi all, i have a newbie problem arising from writing-then-reading a 
> unicode file, and i can't work out what syntax i need to read it in.
> the syntax i'm using now (just using quick hack tmp files):
> fwrap=codecs.EncodedFile(f,"ascii","utf8")
> try:
>     ss=u''
>     print ss
>     ## rrr=xml.dom.minidom.parseString( # originally
> finally:
>     f.close()
> barfs with this error:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in 
> position 5092: ordinal not in range(128)
> any ideas?

Your doing things triple-time, which is this time not even half as good:


gives you a file that will return unicode objects when reading. And


will wrap a normal, non-encoding-aware file to become an encoding aware 
one. The result is that reading reading from the former already yields a 
unicode object that is passed to the second wrapper. It will silently 
pass the unicode-object - but it's useless.

And then you try and pass that unicode object of yours to the minidom. 
But guess what, the minicom parser expects a (byte) string, as it reads 
the mandatory xml encoding header and will decode the contents itself. 
So, the passed unicode object is converted to a string beforehand, 
yielding the exception you see.

Just don't do any fancy encoding stuff at all, a simple


should do.


More information about the Python-list mailing list