[Tutor] symbol encoding and processing problem
Evert Rol
evert.rol at gmail.com
Wed Oct 17 18:28:23 CEST 2007
>>>> raw = unicode("125° 15' 5.55''", 'utf-8')
>>> Again, I think this can be simplified to
>>> raw = u"125° 15' 5.55''"
>> It does, but it's getting confusing when I compare the following:
>> >>> raw = u"125° 15' 5.55''"
>> 125° 15' 5.55''
>
> Where does that output come from?
sorry, my bad: over-hastily copy of non-existant output.
>> >>> print u"125° 15' 5.55''"
>> UnicodeEncodeError: 'ascii' codec can't encode characters in
>> position 3-4: ordinal not in range(128)
>
> print must encode unicode strings. It tries to encode them using
> the default encoding which doesnt' work because the source is not
> ascii.
>> >>> print u"125° 15' 5.55''".encode('utf-8')
>> 125° 15' 5.55''
>
> That is the way to get it to work.
>
>> >>> print unicode("125° 15' 5.55''")
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in
>> position 3: ordinal not in range(128)
>
> Here the problem is trying to create the unicode string using the
> default encoding, again it doesn't work because the source contains
> non-ascii characters.
>
>> >>> print unicode("125° 15' 5.55''", 'utf-8')
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0'
>> in position 3: ordinal not in range(128)
>
> This is the same as the first encode error.
This is the thing I don't get; or only partly: I'm sending a utf-8
encoded string to print. print apparently ignores that, and still
tries to print things using ascii encoding. If I'm correct in that
assessment, then why would print ignore that?
>> So apart from the errors all being slightly different, is there
>> perhaps some difference between the str() and repr() functions
>> (looks like repr uses escape backslashes)?
>
> Right.
>
>> And checking the default encoding inside the python cmdline, I
>> see that my sys module doesn't actually have a setdefaultencoding
>> () method; was that something that should have been properly
>> configured at compile time? The documentation mentions something
>> about the site module, but I can't find it there either.
>
> The setdefaultencoding() function (it's not a method, it is a
> module-level function)
yes, sorry, got my terminology wrong there.
> is removed from the sys module as part of startup (I think by the
> site module). That is why you have to call it from
> sitecustomize.py. You can also
> reload(sys)
> to restore it but it's better to write your app so it doesn't
> require the default encoding to be changed.
Ie, use encode('utf-8') where necessary?
But I did see some examples pass by using
import sys
sys.setdefaultencoding('utf-8')
??
Oh well, in general I tend to play long enough with things like this
that 1) I get it (script) working, and 2) I have a decent feeling
(90%) that I actually understand what is going on, and why other
things failed. Which is roughly where I am now ;-).
Evert
More information about the Tutor
mailing list