[Tutor] symbol encoding and processing problem

Evert Rol evert.rol at gmail.com
Wed Oct 17 18:28:23 CEST 2007


>>>> raw = unicode("125° 15' 5.55''", 'utf-8')
>>> Again, I think this can be simplified to
>>>    raw = u"125° 15' 5.55''"
>> It does, but it's getting confusing when I compare the following:
>>  >>> raw = u"125° 15' 5.55''"
>> 125° 15' 5.55''
>
> Where does that output come from?

sorry, my bad: over-hastily copy of non-existant output.

>>  >>> print u"125° 15' 5.55''"
>> UnicodeEncodeError: 'ascii' codec can't encode characters in  
>> position  3-4: ordinal not in range(128)
>
> print must encode unicode strings. It tries to encode them using  
> the default encoding which doesnt' work because the source is not  
> ascii.
>>  >>> print u"125° 15' 5.55''".encode('utf-8')
>> 125° 15' 5.55''
>
> That is the way to get it to work.
>
>>  >>> print unicode("125° 15' 5.55''")
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in  
>> position  3: ordinal not in range(128)
>
> Here the problem is trying to create the unicode string using the  
> default encoding, again it doesn't work because the source contains  
> non-ascii characters.
>
>>  >>> print unicode("125° 15' 5.55''", 'utf-8')
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0'  
>> in  position 3: ordinal not in range(128)
>
> This is the same as the first encode error.

This is the thing I don't get; or only partly: I'm sending a utf-8  
encoded string to print. print apparently ignores that, and still  
tries to print things using ascii encoding. If I'm correct in that  
assessment, then why would print ignore that?


>> So apart from the errors all being slightly different, is there   
>> perhaps some difference between the str() and repr() functions  
>> (looks  like repr uses escape backslashes)?
>
> Right.
>
>> And checking the default encoding inside the python cmdline, I  
>> see  that my sys module doesn't actually have a setdefaultencoding 
>> ()  method; was that something that should have been properly  
>> configured  at compile time? The documentation mentions something  
>> about the site  module, but I can't find it there either.
>
> The setdefaultencoding() function (it's not a method, it is a  
> module-level function)

yes, sorry, got my terminology wrong there.

> is removed from the sys module as part of startup (I think by the  
> site module). That is why you have to call it from  
> sitecustomize.py. You can also
>   reload(sys)
> to restore it but it's better to write your app so it doesn't  
> require the default encoding to be changed.

Ie, use encode('utf-8') where necessary?
But I did see some examples pass by using

   import sys
   sys.setdefaultencoding('utf-8')

??

Oh well, in general I tend to play long enough with things like this  
that 1) I get it (script) working, and 2) I have a decent feeling  
(90%) that I actually understand what is going on, and why other  
things failed. Which is roughly where I am now ;-).

   Evert




More information about the Tutor mailing list