Safely decoding user input

Tom Miller tom.h.miller at gmail.com
Thu Sep 2 08:31:28 EDT 2010


Hello everyone,

Before I pose my question, I should mention that I'm still pretty unfamiliar
with proper terminology for string encoding, so I might get some of it
wrong. Please bear with me.

I'm writing a program that accepts arguments from the command line. Some of
my users are using Windows with a non-unicode locale setting and characters
outside of the ascii set. So something like

$ program --option <Cyrillic text>

ultimately results in "UnicodeDecodeError: 'utf8' codec can't decode bytes
in position 0-3: invalid data"

My questions:
1) Is it safe to immediately decode all strings in sys.argv[] with something
like sys.argv = [string.decode(sys.stdin.encoding) for string in sys.argv]?
2) Can something similar be done to anything returned by raw_input()?

Thanks,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100902/2568060d/attachment.html>


More information about the Python-list mailing list