unicode issue

Dave Angel davea at ieee.org
Wed Sep 30 18:58:39 CEST 2009


Piet van Oostrum wrote:
>>>>>> Dave Angel <davea at dejaviewphoto.com> (DA) wrote:
>>>>>>             
>
>   
>> DA> Works for me:
>>     
>
>   
>> DA> rrr = downcode(u"Žabovitá zmiešaná kaša")
>> DA> print repr(rrr)
>> DA> print rrr
>>     
>
>   
>> DA> prints out:
>>     
>
>   
>> DA> u'Zabovita zmiesana kasa'
>> DA> Zabovita zmiesana kasa
>>     
>
>   
>> DA> I did have to add an encoding declaration as line 2 of the file:
>>     
>
>   
>> DA> #-*- coding: latin-1 -*-
>>     
>
>   
>> DA> and I had to convince my editor (Komodo) to save the file in utf-8.
>>     
>
> *Seems to work*.
> If you save in utf-8 the coding declaration also has to be utf-8.
> Besides, many of these characters won't be representable in latin-1.
> The reason it worked is that these characters were translated into two-
> or more-bytes sequences and replace did work with these. But it's
> dangerous, as they are then no longer the unicode characters they were
> intended to be. 
>   
Thanks for the correction. What I meant by "works for me" is that the 
single example in the docstring translated okay. But I do have a lot to 
learn about using Unicode in sources, and I want to learn.

So tell me, how were we supposed to guess what encoding the original 
message used? I originally had the mailing list message (in Thunderbird 
email). When I copied (copy/paste) to Komodo IDE (text editor), it 
wouldn't let me save because the file type was ASCII. So I randomly 
chosen latin-1 for file type, and it seemed to like it.

At that point I expected and got errors from Python because I had no 
coding declaration. I used latin-1, and still had problems, though I 
forget what they were. Only when I changed the file encoding type again, 
to utf-8, did the errors go away. I agree that they should agree, but I 
don't know how to reconcile the copy/paste boundary, the file type 
(without BOM, which is another variable), the coding declaration, and 
the stdout implicit ASCII encoding. I understand a bunch of it, but not 
enough to be able to safely walk through the choices.

Is this all written up in one place, to where an experienced programmer 
can make sense of it? I've nibbled at the edges (even wrote a UTF-8 
encoder/decoder a dozen years ago).

DaveA



More information about the Python-list mailing list