[Tutor] Unicode issues (was: UnicodeDecodeError)
Michael Lange
klappnase at freenet.de
Thu Feb 24 11:27:13 CET 2005
On Wed, 23 Feb 2005 23:16:20 -0500
Kent Johnson <kent37 at tds.net> wrote:
> How about
> n = self.nextfile
> if not isinstance(n, unicode):
> n = unicode(n, 'iso8859-1')
> ?
>
> > At least this might explain why "A\xe4" worked and "\xe4" not as I mentioned in a previous post.
> > Now the problem arises how to determine if self.nextfile is unicode or a byte string?
> > Or maybe even better, make sure that self.nextfile is always a byte string so I can safely convert
> > it to unicode later on. But how to convert unicode user input into byte strings when I don't even
> > know the user's encoding ? I guess this will require some further research.
>
> Why do you need to convert back to byte strings?
>
> You can find out the console encoding from sys.stdin and stdout:
> >>> import sys
> >>> sys.stdout.encoding
> 'cp437'
> >>> sys.stdin.encoding
> 'cp437'
>
I *thought* I would have to convert the user input which might be any encoding back into
byte string first (remember, I got heavily confused, because user input was sometimes unicode and
sometimes byte string), so I can convert it to "standard" unicode (utf-8) later on.
I've added this test to the file selection method, where "result" holds the filename the user chose:
if isinstance(result, unicode):
result = result.encode('iso8859-1')
return result
later on self.nextfile is set to "result" .
The idea was, if I could catch the user's encoding, I could do something like:
if isinstance(result, unicode):
result = result.encode(sys.stdin.encoding)
result = unicode(result, 'utf-8')
to avoid problems with unicode objects that have different encodings - or isn't this necessary at all ?
I'm sorry if this is a dumb question, but I'm afraid I'm a complete encoding-idiot.
Thanks and best regards
Michael
More information about the Tutor
mailing list