[Chicago] understanding unicode problems

Kumar McMillan kumar.mcmillan at gmail.com
Thu Nov 15 23:50:32 CET 2007


On Nov 15, 2007 4:13 PM, Carl Karsten <carl at personnelware.com> wrote:
> of course now a unicode problem just hit me.
>
> i use the  django admin to enter  Ivan Krstic'
> and reportlab spits out: http://dev.personnelware.com/carl/a/IvanK1.pdf
>
> so pretty much 100% python.
>
> I am told:
>
>  > Make sure that you are using utf-8 and not some other encoding, such as
>  > latin-1.
>
> But I really don't know what that means, nor do I even know how to debug this.

I wrote up a little something about it when it finally clicked for me:
http://farmdev.com/thoughts/23/what-i-thought-i-knew-about-unicode-in-python-amounted-to-nothing/
(I was in the same spot, I knew I *should* use UTF-8 but wasn't sure
how or why or what that even implied)

In my email reader the name shows up as Ivan Krstic' (with a single
quote at the end) so I'm not sure what character is really at the end
of his name but  .... if Django is dealing with unicode now (which
Feihong says it is) then you probably just need to encode it into a
UTF-8 bytestream before you write to the PDF file.  I.E. pdf.write("%s
%s" % (first_name.encode('utf-8'), last_name.encode('utf-8')).  That
is all assuming that first_name and last_name come from the db and
through Django to you as unicode objects.  If they *do not* then you
have a lot more work to do!

yes, most other languages like PHP, ruby, etc, barely even support
unicode but I still think it is way more cumbersome than it has to be
in Python and the errors one gets are unintuitive and misleading.  I
do agree with Feihong that after python 3 there will probably still be
plenty of confusion and gotchas.  Though, it remains to be seen!

k


More information about the Chicago mailing list