Re: Idiomatic API for shell scripting

(Changed subject, as I don't think a PEP is the right focus for thinking about this problem space) Thiago Padilha wrote:
I 100% agree with this problem statement. As others have mentioned, there are several libraries people have written that try to do something in this direction. I’ve never found one that I felt approached the problem from quite the right angle, so I’ve been experimenting recently with my own solution (incorporating ideas from friends.) I guess this is a good prompt to discuss it more widely! All the ideas below are in the working implementation at https://github.com/gnprice/pysh . ---------- First, one unsung strength of Bash (and friends) that I think is actually an essential foundation is what happens when there’s no redirection, no pipelines, no fancy substitutions — you want to just run a program, and pass it some arguments. The core thing you need to do here is make a list of strings, for the command and its arguments. That’s how the underlying API works. So for example in Bash: # 5 strings: gpg -d -o "$cleartext_path" "$cryptotext_path" # (4 + len(commit_ids)) strings: git merge "${commit_ids[@]}" -m "$message" In Python this typically ends up looking like: subprocess.check_call(['gpg', '-d', '-o', cleartext_path, cryptotext_path]) subprocess.check_call(['git', 'merge'] + commit_ids + ['-m', message]) For the arguments that are literals — which often means most of them -- we keep saying quote-comma-space-quote, quote-comma-space-quote, effectively as a delimiter. That’s a lot of visual noise, as well as extra typing. And when you want to splice in a whole list of arguments, it gets worse. What we want here is something almost like this: subprocess.check_call( f'gpg -d -o {cleartext_path} {cryptotext_path}'.split()) # BAD! ... except that version steps into the classic shell pitfall where if a value happens to contain whitespace, it turns into several arguments and totally changes the meaning. But it’s so close. Really we just want to do that but with the `split` first — so it operates on the literal string you see in the source code — and *then* the `format`, or the f-string substitution. In fact you can implement a 70% solution in just a line: def shwords(fmt, **kwargs): return [word.format(**kwargs) for word in fmt.split()] >>> shwords('rm -rf {tmpdir}/{userdoc}', tmpdir='/tmp', userdoc='1 .. 2') ['rm', '-rf', '/tmp/1 .. 2'] A bit more work gives you f-string-like behavior, if you opt into it: subprocess.check_call(shwords_f('rm -rf {tmpdir}/{userdoc}')) check_cmd_f('rm -rf {tmpdir}/{userdoc}') # small convenience helper and some more gives you positional arguments: check_cmd('gpg -d -o {} {}', cleartext_path, cryptotext_path) and a format-minilanguage extension `{...!@}` , to substitute in a whole list: check_cmd('git merge {!@} -m {}', commit_ids, message) Full implementation here: https://github.com/gnprice/pysh/blob/8faa55b06/pysh/words.py As far as I know, this idea is a new one. I’ve found it goes a long way in making scripts that run a lot of external commands feel Pythonically concise and low-boilerplate. ---------- Then there are a number of features on top of that foundation that we need to really make interacting with external Unix-style commands as convenient as it can be. To avoid making this email longer I’ll just briefly gesture at two of them for now — more details in the repo! * Getting a command’s output in convenient form — not only on stdout, but success/failure from the return code: if not try_cmd('git diff-index --quiet HEAD'): raise ClickException("worktree not clean") # ... commit_id = try_slurp_cmd_f('git rev-parse --verify --quiet {name}') if commit_id is None: commit_id = try_slurp_cmd_f('git rev-parse --verify --quiet origin/{name}') if commit_id is None: raise MissingBranchException(name) * Pipelines. :-) The `|` operator really is too good to pass up: def first_mention(pattern): return pysh.slurp( cmd.run('git log --oneline --reverse --abbrev=9') | cmd.run('grep -m1 {}', pattern) | cmd.run('perl -lane {}', 'print $F[0]') ) I think one important direction for pipelines is making it convenient to stick bits of Python code in the middle of the pipeline, in amongst external programs. That direction isn’t fully developed in the current API. ---------- For anyone who finds this interesting, I’d encourage you to check out the demo scripts: https://github.com/gnprice/pysh/tree/master/example Then if you take some interesting fragment of script you have handy and try converting it to use this library, I’d be very curious to hear how it turns out! That’s what the demo/example scripts are for, and I think it’s the main ingredient this work needs at this stage: seeing how the API works out in a range of use cases, and seeing what patterns it can be improved to better serve. Also naturally I will be glad to discuss it here for as long as others are interested. Cheers, Greg
participants (1)
-
Greg Price