Problem with national characters

Leif B. Kristensen abuse at solumslekt.org
Thu Mar 31 18:02:30 EST 2005


Leif B. Kristensen skrev:

>Is there something else I have to do?

Please forgive me for talking with myself here :-) I should have looked
up Unicode in "Learning Python" before I asked. This seems to work:

>>> u'før'.upper()
u'F\xd8R'
>>> u'FØR'
u'F\xd8R'
>>> 'FØR'
'F\xd8R'

So far, so good. Note that the Unicode representation of the uppercase
version is identical to the default. But when I try the builtin
function unicode(), weird things happen:

>>> s='FØR'
>>> s
'F\xd8R'
>>> unicode(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2:
invalid data

The ActivePython 2.3.2 doesn't even seem to understand the 'u' prefix.
So even if I can get this to work on my own Linux machine, it hardly
looks like a portable solution.

Seems like the "solution" is to keep away from letters above ASCII-127,
like we've done since the dawn of computing ...
-- 
Leif Biberg Kristensen
http://solumslekt.org/



More information about the Python-list mailing list