Safely decoding user input
Tom Miller
tom.h.miller at gmail.com
Thu Sep 2 08:31:28 EDT 2010
Hello everyone,
Before I pose my question, I should mention that I'm still pretty unfamiliar
with proper terminology for string encoding, so I might get some of it
wrong. Please bear with me.
I'm writing a program that accepts arguments from the command line. Some of
my users are using Windows with a non-unicode locale setting and characters
outside of the ascii set. So something like
$ program --option <Cyrillic text>
ultimately results in "UnicodeDecodeError: 'utf8' codec can't decode bytes
in position 0-3: invalid data"
My questions:
1) Is it safe to immediately decode all strings in sys.argv[] with something
like sys.argv = [string.decode(sys.stdin.encoding) for string in sys.argv]?
2) Can something similar be done to anything returned by raw_input()?
Thanks,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100902/2568060d/attachment.html>
More information about the Python-list
mailing list