Issue 15906; regression in argparse in Python 3.3, 3.2, and 2.7

Issue 15906 describes a problem with argparse that is breaking lots of code in Ubuntu. This is a recent regression caused by the fix for issue 12776, and it affects Python 2.7, 3.2, and 3.3. I posted a diff that should fix the problem, but at the heart of it is a semantic ambiguity in argparse that needs clarification. This needs to be cleared up before a proper patch can be applied. I have submitted a patch for what *I* think reasonable semantics should be, but let's see what you think. Issue 12776 tried to fix a problem illustrated by this example: -----snip snip----- import argparse parser = argparse.ArgumentParser() parser.add_argument('--file', type=open, default='/etc/passwd') args = parser.parse_args() print(args.file.read()) args = parser.parse_args(['--file', '/etc/group']) print(args.file.read()) -----snip snip----- What this code is (IMO, sensibly) trying to do is say that args.file will always be an open file, regardless of whether the path comes from the default value or is given on the command line. The problem is that this breaks some other, also sensible code: -----snip snip----- import argparse parser = argparse.ArgumentParser() parser.add_argument("--test", dest="test", type=str, default=[], action='append') args = parser.parse_args(['--test', 'bar']) args.test.append('baz') args = parser.parse_args() args.test.append('baz') -----snip snip----- This code is saying, I want to accumulate string arguments into a list, regardless of whether any arguments are given on the command line. The fix for issue 12776 broke the last two lines of the second example, because in the no-command-line-arguments-given case, arg.test is the *string* "[]" and not the actual empty list object. It seems to me that the semantics could reasonably be implied to mean that the type converter should only be applied to the default value when action='store', as is the default. Then in the second example, because action='append', the type conversion would not be applied (it makes no sense to do so). I have attached a diff to issue 15906 that implements these semantics. If you agree, then I will apply this to all of 3.3, 3.2, and 2.7, which are all affected by this bug (because the original fix for 12776 was applied to all three branches). Georg, I would like to apply this to the 3.3 branch. Cheers, -Barry

On Tue, 11 Sep 2012 11:34:30 -0400, Barry Warsaw <barry@python.org> wrote:
Actually, what 12776 was trying to fix was that the conversion function was *always* called, and thus the file was *always* opened, even when a file argument was passed in the arguments to be parsed.
But apparently in doing so we broke something else. There is no question that the current state of the tip of each branch is a regression. So at a minimum the old behavior needs to be restored, which is that the above code should do as you say: args.test should end up initialized to the empty list. As interesting aside: if the above is changed to args = parser.parse_args(['--test], 'x']) the append will work. The fact that that is true is doubtless a clue as to why the fix for 12776 broke things unexpectedly.
There is another possible semantic, which is that when the store type is append, the converter should be applied to each of the individual items in the default list. Which brings us to another issue: as things stand now, if we have, say, 'default=['abc']', then passing in '--test x' on the command line would result in args.test being equal to ['abc', 'x'] which is consistent with optparse but not necessarily the desired semantics.
The 12776 fix isn't going to be in 3.3, so I don't think this is a pressing issue. We can take our time to make sure we have the correct fix. It is, however, a release blocker for 2.7.4, 3.2.4, and 3.3.1. --David

On Sep 11, 2012, at 12:19 PM, R. David Murray wrote:
Are you sure about that? % ./python Python 3.3.0rc2+ (default:6fea947edead, Sep 11 2012, 15:03:16) [GCC 4.7.1 20120908 (prerelease)] on linux Type "help", "copyright", "credits" or "license" for more information.
Cheers, -Barry

On Sep 11, 2012, at 03:08 PM, Barry Warsaw wrote:
Never mind. Georg didn't pull that patch into his release clone. (Aside: Georg, maybe you could fiddle with the default branch's version numbers so that it reflects being 3.3.1a1?) (Aside 2: releasing/3.3.0 is broken: Traceback (most recent call last): File "/home/barry/projects/python/3.3.0/Parser/asdl.py", line 309, in visit meth(object, *args) File "./Parser/asdl_c.py", line 1043, in visitSum self.simpleSum(sum, name) File "./Parser/asdl_c.py", line 1067, in simpleSum self.emit("default:" % name, 2) TypeError: not all arguments converted during string formatting make: *** [Python/Python-ast.c] Error 1 ) -Barry


On Sep 11, 2012, at 12:19 PM, R. David Murray wrote:
Yep. Maybe for 3.4 <wink>.
Right. This gets to Chris's observation that we're blurring the meaning of 'default'. For action='append' should that be the default value when no arguments are given, or the starting value for the append operations? We might need to add a 'start' argument when action='append'. Cheers, -Barry

On 9/11/2012 11:34 AM, Barry Warsaw wrote:
This second example strikes me (naively, as English speaker but not argparse user) as 'wrong' in that 'default' is being misused to mean 'start value that is always used to generate the final value' [as in sum(iterable, start=0)], rather than 'final value that is only used if nothing else is given' (as in nearly all uses of 'default' in Python). Perhaps this is what you meant by "semantic ambiguity". As I see it, storing is done *with* a default or explicit value, appending is done *to* a start value *with* whatever. Perhaps reusing 'default' instead of using a new name such as 'start' was a bit too clever for our own good ;-).
This seems even more wrong (as in slightly crazy) as it switches the meaning of 'default' within one parser example rather than between parser examples.
I arrive at the same conclusion, I believe, by saying that for a given parser, the type converter should always or never be applied to 'default', which should mean converting or not when the parser is created. Append to 'default as base or start' should mean not converting. -- Terry Jan Reedy

On Tue, Sep 11, 2012 at 10:17 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I noted this in the comment to the issue, but the argparse documentation says this about the type argument: "type= can take any callable that takes a single string argument and returns the converted value." (from http://docs.python.org/dev/library/argparse.html#type ) So type is meant for converting (command-line) strings to other types. What is causing one of the problems here is calling type on a non-string (an empty list in one of the cases above), which doesn't look like it's meant to be supported.. So another way out could simply be not to call type on non-strings. Indeed, this was done before. One of the changes that was made in the patch for issue 12776 was to remove the str type check prior to calling type. --Chris

On Sep 11, 2012, at 01:17 PM, Terry Reedy wrote:
Well, in a sense yes, that is an ambiguity so I won't quibble about whether it's the same one or not. :)
I suspect that it's too late to change this, by say adding a 'start' parameter or some such, at least until Python 3.4.
And yet, that's how it works in 2.7, 3.2, and 3.3.1.
Does that mean that for fixing the regression, you favor applying the type conversion only for action='store' or by only applying it when the default is a string? It seems better to only apply the type conversion for action='store' but more backward compatible for the original behavior to only apply it when default is a string. Cheers, -Barry

On 9/11/2012 3:31 PM, Barry Warsaw wrote:
On Sep 11, 2012, at 01:17 PM, Terry Reedy wrote:
Such cleverness is something that I might have done, having been 'raised' on IBM mainframes with statements with 8-10 position-only parameters (JCL DD? don't quite remember), and having had to carefully count commas writing things like XX arg1, arg2,,,arg5,,,arg8,, The cost of positions rewards multiplexing the meaning of positions. The availability of keyword-only parameters changes the tradeoffs.
In all 2.7 or 3.2?
How about a third rule which amounts to the second *plus* (maybe) the first: apply the conversion when it fulfills the purpose of having type conversions, which is to make sure that the caller get objects of the desired class. Also, only add necessary restrictions to argparse users. In particular: parser.parse_args gets a sequence of strings. But callers of .parse_args often want non-string objects. argparse could return args as named strings and make the user convert to the types wanted, but there are good reasons to do the conversion at the point of parsing. So the conversion function *must* accept strings, but it seems an unnecessary restriction to require it to accept anything else, including its output type or a non-string 'default'. When the 'default' is a substitute for user input, as with Store, *and* it is a string, it should be converted just like user input. But the parser designer should not have to convert a non-string default object to a string just so it can be converted back to the real default non-string object (another unnecessary restriction). (If type=int, there is no sense in requiring default='100' instead of default=100.) Indeed, the default object might not even be a possible output of the conversion function (for instance, type=int, default=None). If the 'default' is not a backup/replacement for a string input to parse_args, as with append, then it seems it should not be converted either.
Are there any other actions where the default might be a string that should be converted the same way as user input? If not, both rules apply. As I said, I am not familiar with the details of argparse (or the previous bugfix), so I let you sort out how these thoughts apply to previous, current, and desired future behavior and doc claims. -- Terry Jan Reedy

On Sep 11, 2012, at 05:30 PM, Terry Reedy wrote:
And yet, that's how it works in 2.7, 3.2, and 3.3.1.
In all 2.7 or 3.2?
It will be broken in 2.7.4 and 3.2.4, but the Ubuntu packages in 12.10 are currently affected because they pulled in some Mercurial updates which included these unreleased changes. That's where it was first noticed.
I've tried the various suggestions out, and I think from a practical point of view, a fix for the regression in the 2.7, 3.2 and 3.3 branches should be to apply the one line check for action being a _StoreAction instance. This seems to be the easiest way to preserve the current behavior (the fix for issues #12776 and #11839) and fix issue #15906. I'll update the issue and apply this change to the three branches. Georg can ignore the change for 3.3.0. -Barry

On Tue, 11 Sep 2012 11:34:30 -0400, Barry Warsaw <barry@python.org> wrote:
Actually, what 12776 was trying to fix was that the conversion function was *always* called, and thus the file was *always* opened, even when a file argument was passed in the arguments to be parsed.
But apparently in doing so we broke something else. There is no question that the current state of the tip of each branch is a regression. So at a minimum the old behavior needs to be restored, which is that the above code should do as you say: args.test should end up initialized to the empty list. As interesting aside: if the above is changed to args = parser.parse_args(['--test], 'x']) the append will work. The fact that that is true is doubtless a clue as to why the fix for 12776 broke things unexpectedly.
There is another possible semantic, which is that when the store type is append, the converter should be applied to each of the individual items in the default list. Which brings us to another issue: as things stand now, if we have, say, 'default=['abc']', then passing in '--test x' on the command line would result in args.test being equal to ['abc', 'x'] which is consistent with optparse but not necessarily the desired semantics.
The 12776 fix isn't going to be in 3.3, so I don't think this is a pressing issue. We can take our time to make sure we have the correct fix. It is, however, a release blocker for 2.7.4, 3.2.4, and 3.3.1. --David

On Sep 11, 2012, at 12:19 PM, R. David Murray wrote:
Are you sure about that? % ./python Python 3.3.0rc2+ (default:6fea947edead, Sep 11 2012, 15:03:16) [GCC 4.7.1 20120908 (prerelease)] on linux Type "help", "copyright", "credits" or "license" for more information.
Cheers, -Barry

On Sep 11, 2012, at 03:08 PM, Barry Warsaw wrote:
Never mind. Georg didn't pull that patch into his release clone. (Aside: Georg, maybe you could fiddle with the default branch's version numbers so that it reflects being 3.3.1a1?) (Aside 2: releasing/3.3.0 is broken: Traceback (most recent call last): File "/home/barry/projects/python/3.3.0/Parser/asdl.py", line 309, in visit meth(object, *args) File "./Parser/asdl_c.py", line 1043, in visitSum self.simpleSum(sum, name) File "./Parser/asdl_c.py", line 1067, in simpleSum self.emit("default:" % name, 2) TypeError: not all arguments converted during string formatting make: *** [Python/Python-ast.c] Error 1 ) -Barry


On Sep 11, 2012, at 12:19 PM, R. David Murray wrote:
Yep. Maybe for 3.4 <wink>.
Right. This gets to Chris's observation that we're blurring the meaning of 'default'. For action='append' should that be the default value when no arguments are given, or the starting value for the append operations? We might need to add a 'start' argument when action='append'. Cheers, -Barry

On 9/11/2012 11:34 AM, Barry Warsaw wrote:
This second example strikes me (naively, as English speaker but not argparse user) as 'wrong' in that 'default' is being misused to mean 'start value that is always used to generate the final value' [as in sum(iterable, start=0)], rather than 'final value that is only used if nothing else is given' (as in nearly all uses of 'default' in Python). Perhaps this is what you meant by "semantic ambiguity". As I see it, storing is done *with* a default or explicit value, appending is done *to* a start value *with* whatever. Perhaps reusing 'default' instead of using a new name such as 'start' was a bit too clever for our own good ;-).
This seems even more wrong (as in slightly crazy) as it switches the meaning of 'default' within one parser example rather than between parser examples.
I arrive at the same conclusion, I believe, by saying that for a given parser, the type converter should always or never be applied to 'default', which should mean converting or not when the parser is created. Append to 'default as base or start' should mean not converting. -- Terry Jan Reedy

On Tue, Sep 11, 2012 at 10:17 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I noted this in the comment to the issue, but the argparse documentation says this about the type argument: "type= can take any callable that takes a single string argument and returns the converted value." (from http://docs.python.org/dev/library/argparse.html#type ) So type is meant for converting (command-line) strings to other types. What is causing one of the problems here is calling type on a non-string (an empty list in one of the cases above), which doesn't look like it's meant to be supported.. So another way out could simply be not to call type on non-strings. Indeed, this was done before. One of the changes that was made in the patch for issue 12776 was to remove the str type check prior to calling type. --Chris

On Sep 11, 2012, at 01:17 PM, Terry Reedy wrote:
Well, in a sense yes, that is an ambiguity so I won't quibble about whether it's the same one or not. :)
I suspect that it's too late to change this, by say adding a 'start' parameter or some such, at least until Python 3.4.
And yet, that's how it works in 2.7, 3.2, and 3.3.1.
Does that mean that for fixing the regression, you favor applying the type conversion only for action='store' or by only applying it when the default is a string? It seems better to only apply the type conversion for action='store' but more backward compatible for the original behavior to only apply it when default is a string. Cheers, -Barry

On 9/11/2012 3:31 PM, Barry Warsaw wrote:
On Sep 11, 2012, at 01:17 PM, Terry Reedy wrote:
Such cleverness is something that I might have done, having been 'raised' on IBM mainframes with statements with 8-10 position-only parameters (JCL DD? don't quite remember), and having had to carefully count commas writing things like XX arg1, arg2,,,arg5,,,arg8,, The cost of positions rewards multiplexing the meaning of positions. The availability of keyword-only parameters changes the tradeoffs.
In all 2.7 or 3.2?
How about a third rule which amounts to the second *plus* (maybe) the first: apply the conversion when it fulfills the purpose of having type conversions, which is to make sure that the caller get objects of the desired class. Also, only add necessary restrictions to argparse users. In particular: parser.parse_args gets a sequence of strings. But callers of .parse_args often want non-string objects. argparse could return args as named strings and make the user convert to the types wanted, but there are good reasons to do the conversion at the point of parsing. So the conversion function *must* accept strings, but it seems an unnecessary restriction to require it to accept anything else, including its output type or a non-string 'default'. When the 'default' is a substitute for user input, as with Store, *and* it is a string, it should be converted just like user input. But the parser designer should not have to convert a non-string default object to a string just so it can be converted back to the real default non-string object (another unnecessary restriction). (If type=int, there is no sense in requiring default='100' instead of default=100.) Indeed, the default object might not even be a possible output of the conversion function (for instance, type=int, default=None). If the 'default' is not a backup/replacement for a string input to parse_args, as with append, then it seems it should not be converted either.
Are there any other actions where the default might be a string that should be converted the same way as user input? If not, both rules apply. As I said, I am not familiar with the details of argparse (or the previous bugfix), so I let you sort out how these thoughts apply to previous, current, and desired future behavior and doc claims. -- Terry Jan Reedy

On Sep 11, 2012, at 05:30 PM, Terry Reedy wrote:
And yet, that's how it works in 2.7, 3.2, and 3.3.1.
In all 2.7 or 3.2?
It will be broken in 2.7.4 and 3.2.4, but the Ubuntu packages in 12.10 are currently affected because they pulled in some Mercurial updates which included these unreleased changes. That's where it was first noticed.
I've tried the various suggestions out, and I think from a practical point of view, a fix for the regression in the 2.7, 3.2 and 3.3 branches should be to apply the one line check for action being a _StoreAction instance. This seems to be the easiest way to preserve the current behavior (the fix for issues #12776 and #11839) and fix issue #15906. I'll update the issue and apply this change to the three branches. Georg can ignore the change for 3.3.0. -Barry
participants (5)
-
Barry Warsaw
-
Chris Jerdonek
-
Georg Brandl
-
R. David Murray
-
Terry Reedy