[Chicago] understanding unicode problems

Carl Karsten carl at personnelware.com
Fri Nov 16 18:27:09 CET 2007


That may happen.

Given that I am getting on a plane in 7 hours, don't count on it.  But if it 
makes/breaks the proposal submission, I'll offer to help more, even to the point 
of taking over and doing the talk if it comes down to that.

Why do I volunteer for these things?

Carl K

Feihong Hsu wrote:
> I learned a lot about how to handle Unicode in Python when I gave my 
> talk on it back in March. So clearly, the best way to understand Unicode 
> is to give a talk on it. That's why you should give the talk, Carl. 
> We'll be here to help you out ;-)
> 
> -Feihong
> 
> */Carl Karsten <carl at personnelware.com>/* wrote:
> 
>     Kumar McMillan wrote:
>      > On Nov 16, 2007 9:07 AM, Carl Karsten wrote:
>      >> Kumar McMillan wrote:
>      >>> I wrote up a little something about it when it finally clicked
>     for me:
>      >>>
>     http://farmdev.com/thoughts/23/what-i-thought-i-knew-about-unicode-in-python-amounted-to-nothing/
>      >>> (I was in the same spot, I knew I *should* use UTF-8 but wasn't
>     sure
>      >>> how or why or what that even implied)
>      >> "However, it's not always possible to work with unicode all the
>     time because not
>      >> everything supports it. As just one example, you'll need to
>     create a wrapper
>      >> that temporarily encodes / decodes data when reading a csv file
>     using the
>      >> standard csv module."
>      >>
>      >> Is there a standard way of encoding?
>      >
>      > I suppose the standard way is to find all the boundaries of your
>      > application (where you accept strings from files or user input) and
>      > convert it all to unicode then deal with it everywhere internally as
>      > unicode. Whenever you need to send output to stdout, a file,
>      > whatever, then you encode it.
>      >
>      >> A string (unicode or not) is a bunch of bytes. unicode chars may
>     use more than
>      >> one byte.
>      >
>      > unicode is actually represented internally as "code points;" it's not
>      > stored in bytes while it's "unicode."
> 
>     Um, what's a "code point"? and what are you calling "bytes", cuz in my
>     vocabulary, everything is stored as a set of bytes, those 8 bit
>     things that the
>     CPU reads and writes to ram and disk drives.
> 
>      >
>      >> What I don't understand: Why do I need to encode / decode?
>      >
>      > Because you can't write unicode to a file, for example. A file
>      > contains bytes and unicode has arbitrary byte representations. When
>      > you encode unicode as UTF-8 the bytestring will look different
>     than if
>      > you encode it as LATIN-1. The reason this is so confusing is that
>      > Python will **try** to do the encoding/decoding for you
>     automatically.
>      > This is also why the errors you see are often very confusing (if you
>      > don't know Python is doing this under the hood).
>      >
> 
>     This will make more sense once I get a grip on what a byte is.
> 
> 
>      >> I get
>      >> the feeling the error caused is a reminder "so that you know
>     that you need to do
>      >> the other operation later."
>      >
>      > if you post a little bit more of the error I can try and give some
>      > specific suggestions for solving it. I wasn't clear exactly what code
>      > was raising the exception you posted earlier.
> 
>     code that errored wasn't mine - it was Paul's, and I think he fixed
>     it. I am
>     back to helping flesh out your unicode talk :)
> 
>     Carl K
>     _______________________________________________
>     Chicago mailing list
>     Chicago at python.org
>     http://mail.python.org/mailman/listinfo/chicago
> 
> 
> Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try 
> it now. 
> <http://us.rd.yahoo.com/evt=51731/*http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ 
>  >
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago


More information about the Chicago mailing list