[Tutor] Unicode Problem

Sat Sep 11 12:14:14 CEST 2004

Hi Steve,
thanks for your explanation.

> >>> print u'ä' + 'ä'
> 
> is the equivalent of:
> 
> >>>print u'ä' + 'ä'.decode('ascii')     # what I don't get is why
> 
> this is called decode ??
> 

I think that's because the value which is 'encoded' in Latin-1 or 
whatever and gets 'decoded' to ascii. Maybe they should've called it 
'recode', since ascii is a code too.

What I still don't understand, is why the Python interpreter is able 
to deal with u'ä' but fails to handle unicode('ä')

   >>> u'ä'
   u'\xe4'
   >>> unicode('ä')
   Traceback (most recent call last):
     File "<interactive input>", line 1, in ?
   UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in
   position 0: ordinal not in range(128)
   >>> unicode('ä'.decode('Latin-1'))
   u'\xe4'

Wouldn't it be nicer if the unicode function used the current locale 
to decode values in range(128,256)?

Thanks again

    Rainer

I apologize for 'kidnapping' a running thread.