[Tutor] UnicodeDecodeError

Kent Johnson kent37 at tds.net
Thu Feb 24 05:16:20 CET 2005


Michael Lange wrote:
> now it looks like the total confusion seems to clear up (at least partially). After some googling it
> seems to me that the best bet is to use unicode strings exclusively. 

I think that is a good plan.

When I set the unicode flag
> in gettext.install() to 1 the gettext strings are unicode, however there's still a problem with the
> user input. As you guessed, "self.nextfile" is unicode only *sometimes*; I tried and changed the line
> from the old traceback into:
> 
>     if unicode(self.nextfile, 'iso8859-1') == _('No destination file selected'):

How about
   n = self.nextfile
   if not isinstance(n, unicode):
     n = unicode(n, 'iso8859-1')
?

> At least this might explain why "A\xe4" worked and "\xe4" not as I mentioned in a previous post.
> Now the problem arises how to determine if self.nextfile is unicode or a byte string?
> Or maybe even better, make sure that self.nextfile is always a byte string so I can safely convert
> it to unicode later on. But how to convert unicode user input into byte strings when I don't even
> know the user's encoding ? I guess this will require some further research.

Why do you need to convert back to byte strings?

You can find out the console encoding from sys.stdin and stdout:
  >>> import sys
  >>> sys.stdout.encoding
'cp437'
  >>> sys.stdin.encoding
'cp437'

IIRC there is also an encoding associated with the current locale, I'm not sure how to use that.

> Unfortunately the latter is no option, because I definitely need portability. I guess I should probably use
> utf-8. 

UTF-8 is your friend :-)

Kent



More information about the Tutor mailing list