Python Unicode to String conversion

Gabriel Genellina gagsl-py2 at
Mon Sep 17 11:30:40 CEST 2007

En Mon, 17 Sep 2007 01:33:14 -0300, Richard Levasseur  
<richardlev at> escribi�:

> When dealing with unicode, i've run into situations where I have
> multiple encodings in the same string, usually latin1 and utf8
> (latin1 != ascii, and latin1 != utf8, and they don't play nice
> together). So, for future readers, if you have problems dealing with
> unicode encode and decode, try using a mix of latin1 and utf8
> encodings to figure out whats going on, and what characters are
> fubar'ing the process.

Life is easier if you follow these guidelines:
- work internally always in Unicode (not strings)
- All input data (read from files, coming from an Internet connection,  
typed by user...) should be decoded from byte strings into unicode as  
early as possible. (You should know which encoding your data comes in, in  
each case)
- All output data (written to files, printing to screen, etc) is encoded  
 from unicode into byte strings as late as possible.

This way, unless your input data is garbage, you never could mix strings  
 from different encodings.
For further information, read the Unicode Howto  
<> and this excerpt form the "Python  
Cookbook", by Alex Martelli  

Gabriel Genellina

More information about the Python-list mailing list