optparse escaping control characters
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Tue Aug 19 09:21:07 EDT 2008
On Tue, 19 Aug 2008 05:35:27 -0700, wannymahoots wrote:
> optparse seems to be escaping control characters that I pass as
> arguments on the command line. Is this a bug? Am I missing something?
> Can this be prevented, or worked around?
You are misinterpreting the evidence. Here's the short explanation:
optparse isn't escaping a control character, because you're not supplying
it with a control character. You're supplying it with two normal
characters, which merely *look* like five (including the quote marks)
because of Python's special handling of backslashes.
If you need it, here's the long-winded explanation.
I've made a small change to your test.py file to demonstrate:
# test.py (modified)
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-d", dest="delimiter", action="store")
(options, args) = parser.parse_args()
print "Options:", options
print "str of options.delimiter =", str(options.delimiter)
print "repr of options.delimiter =", repr(options.delimiter)
print "len of options.delimiter =", len(options.delimiter)
Here's what it does when I call it:
$ python test.py -d '\t'
Options: {'delimiter': '\\t'}
str of options.delimiter = \t
repr of options.delimiter = '\\t'
len of options.delimiter = 2
When you pass '\t' in the command line, the shell sends a literal
backslash followed by a lowercase t to Python. That is, it sends the
literal string '\t', not a control character.
Proof: pass the same string to the "wc" program using "echo". Don't
forget that echo adds a newline to the string:
$ echo 't' | wc # just a t
1 1 2
$ echo '\t' | wc # a backslash and a t, not a control character
1 1 3
That's the first half of the puzzle. Now the second half -- why is Python
adding a *second* backslash to the backslash-t? Actually, it isn't, but
it *seems* to be adding not just a second backslash but also two quote
marks.
The backslash in Python is special. If you wanted a literal backslash t
in a Python string, you would have to type *two* backslashes:
'\\t'
because a single backslash followed by t is escaped to make a tab
character.
But be careful to note that even though you typed five characters (quote,
backslash, backslash, t, quote) Python creates a string of length two: a
single backslash and a t.
Now, when you print something using the str() function, Python hides all
that complexity from you. Hence the line of output that looks like this:
str of options.delimiter = \t
The argument is a literal backslash followed by a t, not a tab character.
But when you print using the repr() function, Python shows you what you
would have typed -- five characters as follows:
repr of options.delimiter = '\\t'
But that's just the *display* of a two character string. The actual
string itself is only two characters, despite the two quotes and the two
backslashes.
Now for the final piece of the puzzle: when you print most composite
objects, like the OptParse Value objects -- the object named "options" in
your code -- Python prints the internals of it using repr() rather than
str().
--
Steven
More information about the Python-list
mailing list