[Tutor] UnicodeDecodeError

Michael Lange klappnase at freenet.de
Wed Feb 23 21:29:06 CET 2005


On Wed, 23 Feb 2005 07:21:40 -0500
Kent Johnson <kent37 at tds.net> wrote:

> 
> This is a part of Python that still confuses me. I think what is happening is
> - self.nextfile is a Unicode string sometimes (when it includes special characters)
> - the gettext string is a byte string
> - to compare the two, the byte string is promoted to Unicode by decoding it with the system default 
> encoding, which is generally 'ascii'.
> - the gettext string includes non-ascii characters and the codec raises an exception.
> 
Thanks Kent,

now it looks like the total confusion seems to clear up (at least partially). After some googling it
seems to me that the best bet is to use unicode strings exclusively. When I set the unicode flag
in gettext.install() to 1 the gettext strings are unicode, however there's still a problem with the
user input. As you guessed, "self.nextfile" is unicode only *sometimes*; I tried and changed the line
from the old traceback into:

    if unicode(self.nextfile, 'iso8859-1') == _('No destination file selected'):

Now when self.nextfile is an existing file "\xe4.wav" that was clicked on in the file dialog's file list this works,
however when I type "\xe4.wav" into the file dialog's entry field I get:

TypeError Exception in Tk callback
  Function: <bound method Snackrecorder.start of <snackrecorder.Snackrecorder instance at 0xb774518c>> (type: <type 'instancemethod'>)
  Args: ()
Traceback (innermost last):
  File "/usr/lib/python2.3/site-packages/Pmw/Pmw_1_2/lib/PmwBase.py", line 1747, in __call__
    return apply(self.func, args)
  File "/usr/local/share/phonoripper-0.6.2/snackrecorder.py", line 304, in start
    if unicode(self.nextfile, 'iso8859-1') == _('No destination file selected'):
TypeError: decoding Unicode is not supported

At least this might explain why "A\xe4" worked and "\xe4" not as I mentioned in a previous post.
Now the problem arises how to determine if self.nextfile is unicode or a byte string?
Or maybe even better, make sure that self.nextfile is always a byte string so I can safely convert
it to unicode later on. But how to convert unicode user input into byte strings when I don't even
know the user's encoding ? I guess this will require some further research.

> I don't know what the best solution is. Two possibilities (substitute your favorite encoding for 
> latin-1):
> - decode the gettext string, e.g.
>    if self.nextfile == _('No destination file selected').decode('latin-1'):
> 
> - set your default encoding to latin-1. (This solution is frowned on by the Python-Unicode 
> cognoscenti and it makes your programs non-portable). Do this by creating a file 
> site-packages/sitecustomize.py containing the lines
> import sys
> sys.setdefaultencoding('latin-1')
> 
> Kent
> 

Unfortunately the latter is no option, because I definitely need portability. I guess I should probably use
utf-8. 

Thanks and best regards

Michael


> > 
> > ######################################################################
> > Error: 1
> > UnicodeDecodeError Exception in Tk callback
> >   Function: <bound method Snackrecorder.start of <snackrecorder.Snackrecorder instance at 0xb77fe24c>> (type: <type 'instancemethod'>)
> >   Args: ()
> > Traceback (innermost last):
> >   File "/usr/lib/python2.3/site-packages/Pmw/Pmw_1_2/lib/PmwBase.py", line 1747, in __call__
> >     return apply(self.func, args)
> >   File "/usr/local/share/phonoripper/snackrecorder.py", line 305, in start
> >     if self.nextfile == _('No destination file selected'):
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 22: ordinal not in range(128)
> > 
> > ######################################################################
> 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


More information about the Tutor mailing list