[Tutor] Python and unicode
Ferry Dave Jäckel
dave.jaeckel at arcor.de
Sat Mar 11 18:06:16 CET 2006
Hi Michael and Kent,
thanks to your tips I was able to solve my problems! It was quite easy at
last.
For those interested and struggling with utf-8, ascii and unicode:
After knowing the right way of
- string.decode() upon input (if in question)
- string.encode() upon output (more often then not)
where input and output are reading and writing to files, file-like
objects, databases... and functions of some not unicode-proof modules
I got rid of all calls to encode() and decode() I made by trial and error
and which messed it all up. Now I have just a few calls to encode() and
voilá! xml.sax seems to read and decode the utf-8 encoded xml-file
perfectly right, so do ZipFile.read() and file.write() - no encding oder
decoding.
To me it was very important to stress out that utf-8 ist *not* unicode,
although I have already read about this topic (and you can read this advise
often here at this list).
On my system sys.stdout and sys.stderr seem to have a utf-8 and a None
encoding, respectively (Kubuntu Linux, python2.4, ipython and konsole as
terminal).
The wrapper suggested by Kent
sys.stdout = codecs.getwriter('utf-8')(sys.stdout, 'backslashreplace')
sys.stderror = codecs.getwriter('ascii')(sys.stderror, 'backslashreplace')
solves all my output problems regarding debugging.
Thank you for your help!
Dave
P.s.: The quotations in my signature are by chance, really. Normally I'm not
the kind of guy believing in prevision... ;)
--
I never realized it before, but having looked that over I'm certain I'd
rather
have my eyes burned out by zombies with flaming dung sticks than work on a
conscientious Unicode regex engine.
-- Tim Peters, 3 Dec 1998
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/tutor/attachments/20060311/232256e7/attachment.pgp
More information about the Tutor
mailing list