[Chicago] understanding unicode problems

Fri Nov 16 00:45:21 CET 2007

On Nov 15, 2007 5:04 PM, Carl Karsten <carl at personnelware.com> wrote:
> > In my email reader the name shows up as Ivan Krstic' (with a single
> > quote at the end) so I'm not sure what character is really at the end
> > of his name but  .... if Django is dealing with unicode now (which
> > Feihong says it is) then you probably just need to encode it into a
> > UTF-8 bytestream before you write to the PDF file.  I.E. pdf.write("%s
> > %s" % (first_name.encode('utf-8'), last_name.encode('utf-8')).
>
> 'ascii' codec can't decode byte 0xc4 in position 10: ordinal not in range(128)
>
>   981. # draw the string using the function that matches the alignment:
>   982. s = obj.getProp("expr", returnException=True)
>   983.
>   984. if isinstance(s, basestring):
>
>   985. s = s.encode(self.Encoding) ...
>
>   986. else:
>   987. s = unicode(s)
>   988. func(posx, 0, s)
>
> Will that ever hit the else case?

nope.  Both str and unicode objects are instances of basestring.  From
the code above I'm not sure what you're trying to do.  However, if you
want to turn s into unicode you could do:

>>> def to_uni(str_or_uni, encoding='utf8'):
...     if isinstance(str_or_uni, str):
...             return unicode(str_or_uni, encoding)
...     elif isinstance(str_or_uni, unicode):
...             return str_or_uni
...     else:
...             raise ValueError("not a basestring instance")
...
>>> to_uni("\xc4\xa3", 'ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 3, in to_uni
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
0: ordinal not in range(128)
>>> to_uni("\xc4\xa3", 'utf8')
u'\u0123'
>>>

does that help illustrate what is happening in your code?  Once you
have the unicode object you can turn it into a bytestream suitable for
printing with: uni_obj.encode('utf8')

FWIW ... since you only posted the first part of char, \xc4, I took a
guess at the last part (to complete the code point), which is probably
not the right one
(http://www.fileformat.info/info/unicode/char/0123/index.htm)