Newbie question: Unicode hiccup on reading file i just wrote
Diez B. Roggisch
deets at nospam.web.de
Mon Jan 30 17:33:25 EST 2006
Darcy schrieb:
> hi all, i have a newbie problem arising from writing-then-reading a
> unicode file, and i can't work out what syntax i need to read it in.
>
> the syntax i'm using now (just using quick hack tmp files):
> BEGIN
> f=codecs.open("tt.xml","r","utf8")
> fwrap=codecs.EncodedFile(f,"ascii","utf8")
> try:
> ss=u''
> ss=fwrap.read()
> print ss
> ## rrr=xml.dom.minidom.parseString(f.read()) # originally
> finally:
> f.close()
> END
>
> barfs with this error:
> BEGIN
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in
> position 5092: ordinal not in range(128)
> END
>
> any ideas?
Your doing things triple-time, which is this time not even half as good:
The
f=codecs.open("tt.xml","r","utf8")
gives you a file that will return unicode objects when reading. And
fwrap=codecs.EncodedFile(f,"ascii","utf8")
will wrap a normal, non-encoding-aware file to become an encoding aware
one. The result is that reading reading from the former already yields a
unicode object that is passed to the second wrapper. It will silently
pass the unicode-object - but it's useless.
And then you try and pass that unicode object of yours to the minidom.
But guess what, the minicom parser expects a (byte) string, as it reads
the mandatory xml encoding header and will decode the contents itself.
So, the passed unicode object is converted to a string beforehand,
yielding the exception you see.
Just don't do any fancy encoding stuff at all, a simple
rrr=xml.dom.minidom.parseString(open("tt.xml").read())
should do.
Diez
More information about the Python-list
mailing list