kent37 at tds.net
Thu Feb 24 05:16:20 CET 2005
Michael Lange wrote:
> now it looks like the total confusion seems to clear up (at least partially). After some googling it
> seems to me that the best bet is to use unicode strings exclusively.
I think that is a good plan.
When I set the unicode flag
> in gettext.install() to 1 the gettext strings are unicode, however there's still a problem with the
> user input. As you guessed, "self.nextfile" is unicode only *sometimes*; I tried and changed the line
> from the old traceback into:
> if unicode(self.nextfile, 'iso8859-1') == _('No destination file selected'):
n = self.nextfile
if not isinstance(n, unicode):
n = unicode(n, 'iso8859-1')
> At least this might explain why "A\xe4" worked and "\xe4" not as I mentioned in a previous post.
> Now the problem arises how to determine if self.nextfile is unicode or a byte string?
> Or maybe even better, make sure that self.nextfile is always a byte string so I can safely convert
> it to unicode later on. But how to convert unicode user input into byte strings when I don't even
> know the user's encoding ? I guess this will require some further research.
Why do you need to convert back to byte strings?
You can find out the console encoding from sys.stdin and stdout:
>>> import sys
IIRC there is also an encoding associated with the current locale, I'm not sure how to use that.
> Unfortunately the latter is no option, because I definitely need portability. I guess I should probably use
UTF-8 is your friend :-)
More information about the Tutor