[Tutor] Handling international characters
Fri May 23 08:08:02 2003
At 22:10 2003-05-22 -0300, Jorge Godoy wrote:
>Is there anyway I can solve/handle that?
It's supposed to be like that. You aren't printing strings you know,
you are printing entire data structures. These data structures might
contain anything, from modules or functions to themselves.
>>> import re
>>> a = [re, 0.1, 'Hellö']
>>> print a
[<module 're' from 'G:\Python22\lib\re.pyc'>, 0.10000000000000001,
There are two builtin python functions for extracting a text string
from any object. The str() function returns something which is
hopefully pleasing to the eye. The repr() function extracts something
which identifies the object as exactly as possible.
>>> f = 0.1
>>> print str(f)
>>> print repr(f)
One tenth can't be described exactly as a binary number, any
more that a third can be described exactly as a decimal number.
repr() shows that, str() hides it.
>>> s = 'ĺ'
>>> print str(s)
>>> print repr(s)
But note that '\xe5' is *not* a python unicode object. The
corresponding unicode object would be u'\xe5'. It's just that
they happen to be coded the same if we use Latin1.
>>>s = 'ĺ'
>>> l = [s, unicode(s, 'latin1')]
>>> print l
>>> for x in l: print x
In the default encoding, ISO8859-1, an a with a ring over it is
stored as the numeric value which is described as e5 in hexadecimal
notation. Anything that would look odd to an American computer user ;)
is printed as a hexadecimal number in "repr".
One good thing about that is that regardless of how clumpsy our
computers or the email systems in between are, I'll be able to
extract "Érica" from the repr() description, since it's all seven
If you just loop though your data structure and print each element,
it will come out printed properly.
If this is supposed to be read by someone, do you really want to keep
the brackets and quote marks?
>>> r = [['Jorge', 'Godoy'], ['Juliano', 'Godoy'], ['Érica', 'Balaniuc']]
>>> print r
[['Jorge', 'Godoy'], ['Juliano', 'Godoy'], ['\xc9rica', 'Balaniuc']]
>>> for row in r:
... for element in row:
... print element,
(This should come out right. If might not fare well through the email
>This is really bugging me for a while. I've tried changing some
>parameters at site.py but I had no success (I tracked another problem
>to what I thought was an XML problem, but then with the editing tool
>that also parses the XML everything works... I don't know what's going
This is the wrong path to take. Don't mess with site.py.
If you convert data between unicode and old fashioned 8 bit strings,
you must do that conversion explicitly, and state what encoding you
string.decode(encoding) => unicode_string
unicode_string.encode(encoding) => string
>>> s = "%s %s" % tuple(r[-1])
>>> print s
>>> u = s.decode('latin-1')
>>> print u
>>> print u.encode('latin1') # will print right in windows/unix etc
>>> print u.encode('cp850') # will print right in DOS box
Magnus Lycka (It's really Lyckå), firstname.lastname@example.org
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The shortest path from thought to working program