unicode mystery/problem

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Sep 22 09:45:00 EDT 2006

In <mailman.453.1158927918.10491.python-list at python.org>, Petr Jakeš

> I have try to experiment with the code a bit.
> the simplest code where I can demonstrate my problems:
> #!/usr/bin python
> import sys
> print "default", sys.getdefaultencoding()
> print "stdout", sys.stdout.encoding
> a=['P\xc5\x99\xc3\xad','Petr Jake\xc5\xa1']
> b="my nice try %s" % ''.join(a).encode("utf-8")

You have two byte strings in the list `a` and try to *encode* them as
utf-8.  That does not work.  You can make the example even a bit simpler::

 'P\xc5\x99\xc3\xadPetr Jake\xc5\xa1'.encode('utf-8')

You cant't *encode* byte strings, just *decode* them.  What happens is
that Python tries to make a unicode string from the byte string to encode
that in utf-8.  But it decodes as ASCII as that is the default.

Don't mix byte strings and unicode strings.  Put an encoding declaration
at the top of your file and convert everything to unicode on the "way in"
and to the proper encoding on the "way out" of your program.

	Marc 'BlackJack' Rintsch

More information about the Python-list mailing list