[Chicago] understanding unicode problems

Fri Nov 16 16:07:40 CET 2007

Kumar McMillan wrote:
> On Nov 15, 2007 4:13 PM, Carl Karsten <carl at personnelware.com> wrote:
>> of course now a unicode problem just hit me.
>>
>> i use the  django admin to enter  Ivan Krstic'
>> and reportlab spits out: http://dev.personnelware.com/carl/a/IvanK1.pdf
>>
>> so pretty much 100% python.
>>
>> I am told:
>>
>>  > Make sure that you are using utf-8 and not some other encoding, such as
>>  > latin-1.
>>
>> But I really don't know what that means, nor do I even know how to debug this.
> 
> I wrote up a little something about it when it finally clicked for me:
> http://farmdev.com/thoughts/23/what-i-thought-i-knew-about-unicode-in-python-amounted-to-nothing/
> (I was in the same spot, I knew I *should* use UTF-8 but wasn't sure
> how or why or what that even implied)

"However, it's not always possible to work with unicode all the time because not 
everything supports it. As just one example, you'll need to create a wrapper 
that temporarily encodes / decodes data when reading a csv file using the 
standard csv module."

Is there a standard way of encoding?

A string (unicode or not) is a bunch of bytes.  unicode chars may use more than 
one byte.  What I don't understand:  Why do I need to encode / decode?  I get 
the feeling the error caused is a reminder "so that you know that you need to do 
the other operation later."

Carl K