'ascii' codec can't encode character u'\xf3'

Tue Aug 17 14:17:41 EDT 2004

Martin Slouf wrote:
>>- print a repr() of the unicode object instead of
>>  the unicode object itself. This will work on all
>>  terminals, and show hex escapes of non-ASCII characters.
> 
> 
> just to make sure:
> 
> override the object's __repr__(self) method to st. like:
> 
> class my_string(string):
>     def __repr__(self)
> 	tmp = unicode(self.attribute1 + " " + self.attribute2)
> 	return tmp
> 
> and use 'my_string' class without any worries instead of classical
> string?

No. Assume yyy is a Unicode object which potentially contains
non-printable characters. Instead of doing

    print yyy

do

   print repr(yyy)

> my system is debian GNU/Linux stable, im using it for a very, very long
> time, though i did not changed any terminal settings but the very
> basics.  My locales are properly set, im using LC_* environment
> variables to set default locale to czech environment with ISO-8859-2
> charset.  Terminal is capable of displaying 8bit charsets, im not sure
> about unicode charsets -- never tried, never needed.

I see. Could it be that you are using Python 2.1, then? Because in
Python 2.3, printing Czech characters to the terminal should work
just fine. Please do

Python 2.3.4 (#2, Aug  5 2004, 09:33:45)
[GCC 3.3.4 (Debian 1:3.3.4-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.stdout.encoding
'ISO-8859-15'

> if 0:
>     # Enable to support locale aware default string encodings.
>     import locale
>     loc = locale.getdefaultlocale()
>     if loc[1]:
>         encoding = loc[1]
> 
> so i guess it is never done :(

You don't need to change the default encoding. Instead,
sys.stdout.encoding is used for printing to the terminal (in 2.3 and
later).

> did you yourself changed it?

No. It will work out of the box.

> well, if a piece of information like you gave to me was contained in
> standard python documentation, probably there will be less
> misunderstanding about this issue.

What piece specifically are you referring to? It is all mentioned
in the standard Python documentation.

> #! /usr/bin/env python
> # -*- coding: UTF-8 -*-
> at the begginnig of my every script, the example above still has to 
> be converted -- because of the iso-8859-1 you use in "Löwis"?

Yes, and no. Yes, it still has to be converted. UTF-8 is *not*
Unicode; it is a byte encoding, and you cannot mix Unicode
strings and byte strings. No, if I use UTF-8 in my source code,
then "Löwis" will be encoded in UTF-8, not in ISO-8859-1.

> can i ommit the conversion (ie. is it done automatically for me as if
> i write
> u"Martin v. " + unicode("Löwis", "ISO-8859-1")
> )?

You can, but you shouldn't. So I won't tell you how you could do that.

> dont understand -- which library?

The ODBC library, for example, or PyQt.

Regards,
Martin