Hm, couldn't this be easily done with shlex?
From the homepage:
""" Frequently Asked Questions
Q: Hey, there is 'shlex' coming with Python. Why there is a need for this module? A: I know 'shlex' and I gave it a try. But 'shlex' takes quotes as word-delemiters which divers from the shell-semantic (see above). And even if 'shlex' would parse strings as needed, I would have written a (very, very) thin layer above, since 'shlex' is simple but seldomly used for this kind of job. """
I saw that after posting. :-( The argument "'shlex' is simple but seldomly used for this kind of job." seems circular though: "I'm not using shlex because it's rarely used" ???
I agree with him. Even disconsidering the fact of the syntax divergence, shellwords is about half the size of shlex, and it's much more confortable, allowing one liners like "for opt in shellwords(line):".
I know I've wished for this once or twice, but not badly enough to bother solving the problem right. I'm worrying that having too many ways to do mostly the same thing adds clode bloat. Couldn't adding something even smaller on top of shlex provide the same interface and solve the syntactic divergence? --Guido van Rossum (home page: http://www.python.org/~guido/)
[...]
Couldn't adding something even smaller on top of shlex provide the same interface and solve the syntactic divergence?
Ok, I'll check if there's an easy way to "turn" shlex into shellwords. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo> Ok, I'll check if there's an easy way to "turn" shlex into Gustavo> shellwords. Cool. Based on this thread and an experiment I tried, some obvious (to me) things come to mind: * get_token() needs to be fixed to handle the 'bar'asd'foo' case * the shlex class should handle strings as input, not just file-like objects * get_word() or get_words() methods in the shlex class could implement the shellwords functionality Skip
Gustavo> Ok, I'll check if there's an easy way to "turn" shlex into Gustavo> shellwords.
Cool. Based on this thread and an experiment I tried, some obvious (to me) things come to mind:
* get_token() needs to be fixed to handle the 'bar'asd'foo' case
* the shlex class should handle strings as input, not just file-like objects
* get_word() or get_words() methods in the shlex class could implement the shellwords functionality
I'd be happy to see this done. You might submit the changes to ESR for review but he may be busy so don't wait for him. --Guido van Rossum (home page: http://www.python.org/~guido/)
Cool. Based on this thread and an experiment I tried, some obvious (to me) things come to mind:
* get_token() needs to be fixed to handle the 'bar'asd'foo' case
* the shlex class should handle strings as input, not just file-like objects
* get_word() or get_words() methods in the shlex class could implement the shellwords functionality
I'd be happy to see this done. You might submit the changes to ESR for review but he may be busy so don't wait for him.
Great! I'll work on it. How should we do to avoid compatibility problems? Some solutions that come into my mind are: - Forget about it completely and fix the syntax handling to be posix compliant. - Create a subclass of shlex, or a completely different class (shlex_posix?) depending on how much can be reused. - Add a flag to the constructor. Suggestions? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Great! I'll work on it.
How should we do to avoid compatibility problems? Some solutions that come into my mind are:
- Forget about it completely and fix the syntax handling to be posix compliant.
- Create a subclass of shlex, or a completely different class (shlex_posix?) depending on how much can be reused.
- Add a flag to the constructor.
Thinking further about this, I belive there's a better solution. I'll write different functions (probably read_word()/get_word()) with the new behavior. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Cool. Based on this thread and an experiment I tried, some obvious (to me) things come to mind:
* get_token() needs to be fixed to handle the 'bar'asd'foo' case
* the shlex class should handle strings as input, not just file-like objects
* get_word() or get_words() methods in the shlex class could implement the shellwords functionality
Ok, it was easier than I imagined. Here's an example of the new shlex. Maintaining the old behavior (notice that now strings are accepted as arguments):
import shlex l = shlex.shlex("'foo'a'bar'") l.get_token() "'foo'" l.get_token() "a'bar'"
New behavior:
l = shlex.shlex("'foo'a'bar'", posix=1) l.get_token() 'fooabar'
Introduced iterator interface:
for i in shlex.shlex("'foo'a'bar'"): ... print i ... 'foo' a'bar'
New function, mimicking shellwords:
shlex.split_args("'foo'a'bar' -o='foo bar'") ['fooabar', '-o=foo bar']
I'm not sure if "posix" and "split_args" are the best names for these features. Suggestions? I've just commited patch #722686 (and assigned to Guido, as he suggested recently ;-). -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer writes:
Ok, I'll check if there's an easy way to "turn" shlex into shellwords.
Is there any real objection to simply fixing shlex to get it right? I'm guessing that the divergence from shell quoting was more a matter of implementation expedience and a feeling that it was "good enough" for whatever original application it was written for. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
Fred L. Drake, Jr. <fdrake@acm.org>:
Gustavo Niemeyer writes:
Ok, I'll check if there's an easy way to "turn" shlex into shellwords.
Is there any real objection to simply fixing shlex to get it right? I'm guessing that the divergence from shell quoting was more a matter of implementation expedience and a feeling that it was "good enough" for whatever original application it was written for.
That is correct. I originally wrote shlex as the parser logic for a .netrc module. I would have no intrinsic objection to having this behavior fixed, though there is of course the general problem of how much we value not breaking old code. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
participants (5)
-
Eric S. Raymond
-
Fred L. Drake, Jr.
-
Guido van Rossum
-
Gustavo Niemeyer
-
Skip Montanaro