[issue1521950] shlex.split() does not tokenize like the shell

Dan Christian report at bugs.python.org
Sat Nov 26 15:40:24 CET 2011


Dan Christian <robodan at users.sourceforge.net> added the comment:

On Sat, Nov 26, 2011 at 7:12 AM, Éric Araujo <report at bugs.python.org> wrote:
> Your script passes with dash, which is probably the most POSIX-compliant shell we can find.  (bash has extensions, zsh/csh don’t use the POSIX shell language, so I think the behavior of dash should be our reference, not the bash man page.)

I was just looking for a reference where I didn't have to sift through
tons of documentation.  Most systems have bash.  Before that I was
just working from experience (I've done a lot of shell scripting).

> there is code out there that depends on the current behavior of shlex and does not need to support && || ; ( ), if we add support for these tokens we should not break the existing code.

Here's a thought on how that might work (just brainstorming).  shlex
uses a series of character strings to drive it's parsing:  whitespace,
escape, quotes.  Add another one: control = '();<>|&'.  If it is unset
(by default?), then the behavior is as before.  If it is set, then
shlex will output any character in control as a separate token.

There might be a shell specific script (or maybe it's left to the
user) that decides that certain tokens can be recombined:  '&&', '||',
'|&', '>>', etc.  This code is pretty simple:  walk the token
sequence, if you see a two token pair, pop the second and combine it
into the first.

-Dan

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1521950>
_______________________________________


More information about the Python-bugs-list mailing list