Safely decoding user input
tom.h.miller at gmail.com
Thu Sep 2 14:31:28 CEST 2010
Before I pose my question, I should mention that I'm still pretty unfamiliar
with proper terminology for string encoding, so I might get some of it
wrong. Please bear with me.
I'm writing a program that accepts arguments from the command line. Some of
my users are using Windows with a non-unicode locale setting and characters
outside of the ascii set. So something like
$ program --option <Cyrillic text>
ultimately results in "UnicodeDecodeError: 'utf8' codec can't decode bytes
in position 0-3: invalid data"
1) Is it safe to immediately decode all strings in sys.argv with something
like sys.argv = [string.decode(sys.stdin.encoding) for string in sys.argv]?
2) Can something similar be done to anything returned by raw_input()?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list