[Chicago] understanding unicode problems

Kumar McMillan kumar.mcmillan at gmail.com
Fri Nov 16 17:01:18 CET 2007


On Nov 16, 2007 9:07 AM, Carl Karsten <carl at personnelware.com> wrote:
> Kumar McMillan wrote:
> > I wrote up a little something about it when it finally clicked for me:
> > http://farmdev.com/thoughts/23/what-i-thought-i-knew-about-unicode-in-python-amounted-to-nothing/
> > (I was in the same spot, I knew I *should* use UTF-8 but wasn't sure
> > how or why or what that even implied)
>
> "However, it's not always possible to work with unicode all the time because not
> everything supports it. As just one example, you'll need to create a wrapper
> that temporarily encodes / decodes data when reading a csv file using the
> standard csv module."
>
> Is there a standard way of encoding?

I suppose the standard way is to find all the boundaries of your
application (where you accept strings from files or user input) and
convert it all to unicode then deal with it everywhere internally as
unicode.  Whenever you need to send output to stdout, a file,
whatever, then you encode it.

>
> A string (unicode or not) is a bunch of bytes.  unicode chars may use more than
> one byte.

unicode is actually represented internally as "code points;" it's not
stored in bytes while it's "unicode."

> What I don't understand:  Why do I need to encode / decode?

Because you can't write unicode to a file, for example.  A file
contains bytes and unicode has arbitrary byte representations.  When
you encode unicode as UTF-8 the bytestring will look different than if
you encode it as LATIN-1.  The reason this is so confusing is that
Python will **try** to do the encoding/decoding for you automatically.
 This is also why the errors you see are often very confusing (if you
don't know Python is doing this under the hood).

>  I get
> the feeling the error caused is a reminder "so that you know that you need to do
> the other operation later."

if you post a little bit more of the error I can try and give some
specific suggestions for solving it.  I wasn't clear exactly what code
was raising the exception you posted earlier.

K


More information about the Chicago mailing list