potential argparse problem: bad mix of parse_known_args and prefix matching
Hello, argparse does prefix matching as long as there are no conflicts. For example: argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args() If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs). If there's another argument registered, say "--sync-bar" the above will fail due to a conflict. Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it. Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called. Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO). Eli
I think matching on the shortest unique prefix is common for command line parsers in general, not just argparse. I believe optparse did this too, and even the venerable getopt does! I think all this originated in the original (non-Python) GNU standard for long option parsing. All that probably explains why the docs hardly touch upon it. As to why parse_known_args also does this, I can see the reasoning behind this behavior: to the end user, "--sync" is a valid option, so it would be surprising if it didn't get recognized under certain conditions. I suppose you were badly bitten by this recently? Can you tell us more about what happened? On Tue, Nov 26, 2013 at 9:30 AM, Eli Bendersky <eliben@gmail.com> wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
Eli
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Tue, Nov 26, 2013 at 9:38 AM, Guido van Rossum <guido@python.org> wrote:
I think matching on the shortest unique prefix is common for command line parsers in general, not just argparse. I believe optparse did this too, and even the venerable getopt does! I think all this originated in the original (non-Python) GNU standard for long option parsing. All that probably explains why the docs hardly touch upon it.
As to why parse_known_args also does this, I can see the reasoning behind this behavior: to the end user, "--sync" is a valid option, so it would be surprising if it didn't get recognized under certain conditions.
I suppose you were badly bitten by this recently? Can you tell us more about what happened?
Sure. We have a Python script that serves as a gateway to another program. That other program has a "--sync" option. The gateway script has a "--sync-foo" option. When the gateway script is invoked with "--sync", we'd expect it to pass it to the program; instead, it matches it to its own "--sync-foo" and consumes the option. Practically, this means a big caveat on exactly the use case parse_known_args was designed for: whenever I have a Python script using argparse and passing unknown arguments to other programs, I have to manually make sure there are no common prefixes between any commands to avoid this problem. Frankly I don't see how the current behavior can be seen as the intended one. Eli
On Tue, Nov 26, 2013 at 9:30 AM, Eli Bendersky <eliben@gmail.com> wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
Eli
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Eli did give his use case... a front end for a program that has a parameter "--sync", and a front end preprocessor of some sort was trying to use "--sync-foo" as an argument, and wanted "--sync" to be left in the parameters to send on to the back end program. Design of the front-end might better be aware of back end parameters and not conflict, but the documentation could be improved, likely. It might also be possible to add a setting to disable the prefix matching feature, for code that prefers it not be done. Whether that is better done as a global setting, or a per-parameter setting I haven't thought through. But both the constructor and the parameter definitions already accept a variable number of named parameters, so I would think it would be possible to add another, and retain backward compatibility via an appropriate default. On 11/26/2013 9:38 AM, Guido van Rossum wrote:
I think matching on the shortest unique prefix is common for command line parsers in general, not just argparse. I believe optparse did this too, and even the venerable getopt does! I think all this originated in the original (non-Python) GNU standard for long option parsing. All that probably explains why the docs hardly touch upon it.
As to why parse_known_args also does this, I can see the reasoning behind this behavior: to the end user, "--sync" is a valid option, so it would be surprising if it didn't get recognized under certain conditions.
I suppose you were badly bitten by this recently? Can you tell us more about what happened?
On Tue, Nov 26, 2013 at 9:30 AM, Eli Bendersky <eliben@gmail.com <mailto:eliben@gmail.com>> wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
Eli
On Tue, Nov 26, 2013 at 9:53 AM, Glenn Linderman <v+python@g.nevcal.com>wrote:
Eli did give his use case... a front end for a program that has a parameter "--sync", and a front end preprocessor of some sort was trying to use "--sync-foo" as an argument, and wanted "--sync" to be left in the parameters to send on to the back end program.
Design of the front-end might better be aware of back end parameters and not conflict, but the documentation could be improved, likely.
It might also be possible to add a setting to disable the prefix matching feature, for code that prefers it not be done. Whether that is better done as a global setting, or a per-parameter setting I haven't thought through. But both the constructor and the parameter definitions already accept a variable number of named parameters, so I would think it would be possible to add another, and retain backward compatibility via an appropriate default.
FWIW I'm not advocating a breaking behavior change here - I fully realize the ship has sailed. I'm interested in mitigation actions, though. Making the documentation explain this explicitly + adding an option to disable prefix matching (in 3.5 since we're past the 3.4 features point) will go a long way for alleviating this gotcha. Eli
On 11/26/2013 9:38 AM, Guido van Rossum wrote:
I think matching on the shortest unique prefix is common for command line parsers in general, not just argparse. I believe optparse did this too, and even the venerable getopt does! I think all this originated in the original (non-Python) GNU standard for long option parsing. All that probably explains why the docs hardly touch upon it.
As to why parse_known_args also does this, I can see the reasoning behind this behavior: to the end user, "--sync" is a valid option, so it would be surprising if it didn't get recognized under certain conditions.
I suppose you were badly bitten by this recently? Can you tell us more about what happened?
On Tue, Nov 26, 2013 at 9:30 AM, Eli Bendersky <eliben@gmail.com> wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
Eli
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eliben%40gmail.com
On 26 November 2013 18:16, Eli Bendersky <eliben@gmail.com> wrote:
FWIW I'm not advocating a breaking behavior change here - I fully realize the ship has sailed. I'm interested in mitigation actions, though. Making the documentation explain this explicitly + adding an option to disable prefix matching (in 3.5 since we're past the 3.4 features point) will go a long way for alleviating this gotcha.
Prefix matching is pretty much the norm in my experience, so I don't think it should surprise people in general. That's not to say that being more explicit in the docs wouldn't be a good thing anyway. For the "passing unknown parameters through" case, I think having a means to disable prefix matching is good (and it's a reasonably important use case). It *does* seem to me to be obvious from the name that an alternative method named "parse_known_args" is the way to disable prefix matching. So I think it needs to be clearly documented that it *isn't* and precisely what purpose it is intended to serve. And it definitely doesn't do that at the moment. My feeling is that the following changes would be good: 1. Clarify in the documentation of parse_known_args that the function does not behave any differently from parse_args in terms of what arguments it accepts (and explicitly give an example showing that prefix arguments *are* consumed). This can be done for 3.4 as it doesn't change functionality. 2. Add a keyword argument to parse_known_args that allows the user to disable prefix matching (I'd suggest match_prefix with a default of True). This would be for 3.5. Paul
On 2013-11-26 18:16, Eli Bendersky wrote:
FWIW I'm not advocating a breaking behavior change here - I fully realize the ship has sailed. I'm interested in mitigation actions, though. Making the documentation explain this explicitly + adding an option to disable prefix matching (in 3.5 since we're past the 3.4 features point) will go a long way for alleviating this gotcha.
There is already the One Obvious Way to handle situations like yours: the user uses "--" to mark that the remaining arguments are pure arguments and not --options. parent-script --verbose -- ./child_script.py --no-verbose --etc This is a common idiom across many argument processors. parse_known_args() is not a good solution for this situation even if you mitigate the prefix issue. Exact matches of --options are still possible. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Tue, 26 Nov 2013 09:30:10 -0800, Eli Bendersky <eliben@gmail.com> wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
I'm not sure whether or not it is a bug, but there are a number of bugs involving argument prefix matching in the tracker. Including one to be able to disable it, that has a patch but didn't get into 3.4 (issue 14910). This topic seems more suited to the bug tracker than python-dev. Stephen hasn't been very active lately. There is at least one non-committer who has been, but no committers have made time to review the patches and take action (they are on my list, but there is a lot of stuff ahead of them). So if you'd like to help out in that regard, that would be great :) --David
On Tue, Nov 26, 2013 at 9:46 AM, R. David Murray <rdmurray@bitdance.com>wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using
On Tue, 26 Nov 2013 09:30:10 -0800, Eli Bendersky <eliben@gmail.com> wrote: parse_known_args)
knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
I'm not sure whether or not it is a bug, but there are a number of bugs involving argument prefix matching in the tracker. Including one to be able to disable it, that has a patch but didn't get into 3.4 (issue 14910).
This topic seems more suited to the bug tracker than python-dev.
Stephen hasn't been very active lately. There is at least one non-committer who has been, but no committers have made time to review the patches and take action (they are on my list, but there is a lot of stuff ahead of them). So if you'd like to help out in that regard, that would be great :)
Being able to disable prefix matching is certainly a good idea. I'll nosy myself on Issue 14910 and will try to find some time for it. Eli
On Tue, Nov 26, 2013 at 9:30 AM, Eli Bendersky <eliben@gmail.com> wrote:
Hello,
argparse does prefix matching as long as there are no conflicts. For example:
argparser = argparse.ArgumentParser() argparser.add_argument('--sync-foo', action='store_true') args = argparser.parse_args()
If I pass "--sync" to this script, it recognizes it as "--sync-foo". This behavior is quite surprising although I can see the motivation for it. At the very least it should be much more explicitly documented (AFAICS it's barely mentioned in the docs).
If there's another argument registered, say "--sync-bar" the above will fail due to a conflict.
Now comes the nasty part. When using "parse_known_args" instead of "parse_args", the above happens too - --sync is recognized for --sync-foo and captured by the parser. But this is wrong! The whole idea of parse_known_args is to parse the known args, leaving unknowns alone. This prefix matching harms more than it helps here because maybe the program we're actually acting as a front-end for (and hence using parse_known_args) knows about --sync and wants to get it.
Unless I'm missing something, this is a bug. But I'm also not sure whether we can do anything about it at this point, as existing code *may* be relying on it. The right thing to do would be to disable this prefix matching when parse_known_args is called.
Again, at the very least this should be documented (for parse_known_args not less than a warning box, IMHO).
I created http://bugs.python.org/issue19814 for the documentation patch. http://bugs.python.org/issue14910 deals with making prefix matching optional, but that will have to be deferred to 3.5 Eli
participants (6)
-
Eli Bendersky -
Glenn Linderman -
Guido van Rossum -
Paul Moore -
R. David Murray -
Robert Kern