'ascii' codec can't encode character u'\xf3'

Tue Aug 17 02:42:34 EDT 2004

Martin Slouf wrote:
> the solution seems to be:
> 
> 0. string is not in unicode encoding (assumption)
> 1. before printing out, convert the string to unicode
> 2. when printing, convert to whatever charset you like

There is an alternative, if the print is a debug print:

- print a repr() of the unicode object instead of
   the unicode object itself. This will work on all
   terminals, and show hex escapes of non-ASCII characters.

> 1. why the string is not in unicode can have several reasons -- i guess:
> 	- does ogg stores tags in unicode?
> 	- you have parsed an xml file with encoding attribute set (that
> is what i do)
> 	- etc

Correct.

> 2. "replace" parameter in encode causes non-printable chars to be
> replaced with '?' (you can use "ignore" or strict", see your python
> doc)

Correct.

> 3. the above will work _only_ _if_ the 'str' encoding is "iso-8859-2" --
> a funny thing -- first line of code converts from unknown (but the
> programmer must know it) to unicode and the second one converts it back
> from unicode to unknown (now the programmer tells that secret to python
> :)

No. unicode(text) uses the system default encoding
(sys.getdefaultencoding()) which normally is ASCII.

Printing a Unicode string to a terminal should work fine if the terminal
is properly configured. What that means depends on your operating
system.

> 	* my assumptions are right

Most of them.

> 
> 	* why is that behaviour? -- if you search google you get
> thousands of errors like this -- with no proper solutions i must add

There is a proper solution. Unfortunately, very similar yet different
problems cause the same error message, and each problem has a different
proper solution:

- A Unicode error is raised when trying to combine a Unicode string
   and a byte string, if the byte string contains non-ASCII characters,
   e.g.

    u"Martin v. " + "Löwis"

   The proper solution is to convert the second string into a Unicode
   object, e.g. through

            unicode("Löwis", "iso-8859-1")

- A unicode error is raised when a Unicode string is printed to
   a terminal. The proper solution is that the system administrator
   or the user should properly administer the locale, so that Python
   knows what characters the terminal can print. For characters that
   are then still non-printable, repr() is the proper solution.

- A unicode error is raised when a library does not support Unicode
   for some reason. The proper solution is to fix the library. A
   proper work-around is to explicitly convert Unicode strings into
   the encoding that the library expects.

> 	* is there an easier portable way (no sitecustomize.py changes)
> to do it

Yes, see above.

> 	* i was looking in site.py and there is deleted the
> sys.setdefaultencoding() function, but from the comments i do
> not know why -- you know it? why is user not allowed to change the
> default encoding? it seems reasonable to me if he/she could do that.

Yes, but that would not be a proper solution. It would mean that your
script now only works on your system, and fails on a system where
the default encoding has not been changed, or has been changed to
something else. Users should use a proper solution instead.

Regards,
Martin