[Tutor] symbol encoding and processing problem

Evert Rol evert.rol at gmail.com
Wed Oct 17 17:48:57 CEST 2007


>> raw = unicode("125° 15' 5.55''", 'utf-8')
>
> Again, I think this can be simplified to
>    raw = u"125° 15' 5.55''"

It does, but it's getting confusing when I compare the following:

 >>> raw = u"125° 15' 5.55''"
125° 15' 5.55''

 >>> print u"125° 15' 5.55''"
UnicodeEncodeError: 'ascii' codec can't encode characters in position  
3-4: ordinal not in range(128)

 >>> print u"125° 15' 5.55''".encode('utf-8')
125° 15' 5.55''

 >>> print unicode("125° 15' 5.55''")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position  
3: ordinal not in range(128)

 >>> print unicode("125° 15' 5.55''", 'utf-8')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in  
position 3: ordinal not in range(128)


So apart from the errors all being slightly different, is there  
perhaps some difference between the str() and repr() functions (looks  
like repr uses escape backslashes)?
Or does it simply have to do with my locale, which is set to the  
default "C" (terminal = standard Mac OS X terminal, with UTF-8  
encoding)? Although that wouldn't explain to me why the third  
statement works.
And checking the default encoding inside the python cmdline, I see  
that my sys module doesn't actually have a setdefaultencoding()  
method; was that something that should have been properly configured  
at compile time? The documentation mentions something about the site  
module, but I can't find it there either.

Any enlightenment on this is welcome.

   Evert




More information about the Tutor mailing list