PEP 324: popen5 - New POSIX process module

There's a new PEP available: PEP 324: popen5 - New POSIX process module A copy is included below. Comments are appreciated. ---- PEP: 324 Title: popen5 - New POSIX process module Version: $Revision: 1.4 $ Last-Modified: $Date: 2004/01/03 10:32:53 $ Author: Peter Astrand <astrand@lysator.liu.se> Status: Draft Type: Standards Track (library) Created: 19-Nov-2003 Content-Type: text/plain Python-Version: 2.4 Abstract This PEP describes a new module for starting and communicating with processes on POSIX systems. Motivation Starting new processes is a common task in any programming language, and very common in a high-level language like Python. Good support for this task is needed, because: - Inappropriate functions for starting processes could mean a security risk: If the program is started through the shell, and the arguments contain shell meta characters, the result can be disastrous. [1] - It makes Python an even better replacement language for over-complicated shell scripts. Currently, Python has a large number of different functions for process creation. This makes it hard for developers to choose. The popen5 modules provides the following enhancements over previous functions: - One "unified" module provides all functionality from previous functions. - Cross-process exceptions: Exceptions happening in the child before the new process has started to execute are re-raised in the parent. This means that it's easy to handle exec() failures, for example. With popen2, for example, it's impossible to detect if the execution failed. - A hook for executing custom code between fork and exec. This can be used for, for example, changing uid. - No implicit call of /bin/sh. This means that there is no need for escaping dangerous shell meta characters. - All combinations of file descriptor redirection is possible. For example, the "python-dialog" [2] needs to spawn a process and redirect stderr, but not stdout. This is not possible with current functions, without using temporary files. - With popen5, it's possible to control if all open file descriptors should be closed before the new program is executed. - Support for connecting several subprocesses (shell "pipe"). - Universal newline support. - A communicate() method, which makes it easy to send stdin data and read stdout and stderr data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop. This means that many Python applications contain race conditions. A communicate() method in the standard library solves this problem. Rationale The following points summarizes the design: - popen5 was based on popen2, which is tried-and-tested. - The factory functions in popen2 have been removed, because I consider the class constructor equally easy to work with. - popen2 contains several factory functions and classes for different combinations of redirection. popen5, however, contains one single class. Since popen5 supports 12 different combinations of redirection, providing a class or function for each of them would be cumbersome and not very intuitive. Even with popen2, this is a readability problem. For example, many people cannot tell the difference between popen2.popen2 and popen2.popen4 without using the documentation. - One small utility function is provided: popen5.run(). It aims to be an enhancement over os.system(), while still very easy to use: - It does not use the Standard C function system(), which has limitations. - It does not call the shell implicitly. - No need for quoting; using a variable argument list. - The return value is easier to work with. - The "preexec" functionality makes it possible to run arbitrary code between fork and exec. One might ask why there are special arguments for setting the environment and current directory, but not for, for example, setting the uid. The answer is: - Changing environment and working directory is considered fairly common. - Old functions like spawn() has support for an "env"-argument. - env and cwd are considered quite cross-platform: They make sense even on Windows. - No MS Windows support is available, currently. To be able to provide more functionality than what is already available from the popen2 module, help from C modules is required. Specification This module defines one class called Popen: class Popen(args, bufsize=0, argv0=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, preexec_args=(), close_fds=0, cwd=None, env=None, universal_newlines=0) Arguments are: - args should be a sequence of program arguments. The program to execute is normally the first item in the args sequence, but can be explicitly set by using the argv0 argument. The Popen class uses os.execvp() to execute the child program. - bufsize, if given, has the same meaning as the corresponding argument to the built-in open() function: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size. A negative bufsize means to use the system default, which usually means fully buffered. The default value for bufsize is 0 (unbuffered). - stdin, stdout and stderr specify the executed programs' standard input, standard output and standard error file handles, respectively. Valid values are PIPE, an existing file descriptor (a positive integer), an existing file object, and None. PIPE indicates that a new pipe to the child should be created. With None, no redirection will occur; the child's file handles will be inherited from the parent. Additionally, stderr can be STDOUT, which indicates that the stderr data from the applications should be captured into the same file handle as for stdout. - If preexec_fn is set to a callable object, this object will be called in the child process just before the child is executed, with arguments preexec_args. - If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. - If cwd is not None, the current directory will be changed to cwd before the child is executed. - If env is not None, it defines the environment variables for the new process. - If universal_newlines is true, the file objects fromchild and childerr are opened as a text files, but lines may be terminated by any of '\n', the Unix end-of-line convention, '\r', the Macintosh convention or '\r\n', the Windows convention. All of these external representations are seen as '\n' by the Python program. Note: This feature is only available if Python is built with universal newline support (the default). Also, the newlines attribute of the file objects fromchild, tochild and childerr are not updated by the communicate() method. The module also defines one shortcut function: run(*args): Run command with arguments. Wait for command to complete, then return the returncode attribute. Example: retcode = popen5.run("stty", "sane") Exceptions ---------- Exceptions raised in the child process, before the new program has started to execute, will be re-raised in the parent. Additionally, the exception object will have one extra attribute called 'child_traceback', which is a string containing traceback information from the child's point of view. The most common exception raised is OSError. This occurs, for example, when trying to execute a non-existent file. Applications should prepare for OSErrors. A PopenException will also be raised if Popen is called with invalid arguments. Security -------- popen5 will never call /bin/sh implicitly. This means that all characters, including shell metacharacters, can safely be passed to child processes. Popen objects ------------- Instances of the Popen class have the following methods: poll() Returns -1 if child process hasn't completed yet, or its exit status otherwise. See below for a description of how the exit status is encoded. wait() Waits for and returns the exit status of the child process. The exit status encodes both the return code of the process and information about whether it exited using the exit() system call or died due to a signal. Functions to help interpret the status code are defined in the os module (the W*() family of functions). communicate(input=None) Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate. The optional stdin argument should be a string to be sent to the child process, or None, if no data should be sent to the child. communicate() returns a tuple (stdout, stderr). Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited. The following attributes are also available: fromchild A file object that provides output from the child process. tochild A file object that provides input to the child process. childerr A file object that provides error output from the child process. pid The process ID of the child process. returncode The child return code. A None value indicates that the process hasn't terminated yet. A negative value means that the process was terminated by a signal with number -returncode. Open Issues Perhaps the module should be called something like "process", instead of "popen5". Reference Implementation A reference implementation is available from http://www.lysator.liu.se/~astrand/popen5/. References [1] Secure Programming for Linux and Unix HOWTO, section 8.3. http://www.dwheeler.com/secure-programs/ [2] Python Dialog http://pythondialog.sourceforge.net/ Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: -- /Peter Åstrand <astrand@lysator.liu.se>

On Sat, 3 Jan 2004, Peter Åstrand wrote:
PEP 324: popen5 - New POSIX process module
A copy is included below. Comments are appreciated.
There are some issues wrt Windows support. Currently, popen5 does not support Windows at all. To be able to do so, we must chose between: 1) Rely on Mark Hammonds Win32 extensions. This makes it possible to keep popen5 as a pure Python module. The drawback is that Windows support won't be available unless the user installs win32all. Does anyone think there's a problem with adding a module to the Python standard library that uses win32all? or 2) Write supporting code in C (basically, copy the required process handling functions from win32all). -- /Peter Åstrand <astrand@lysator.liu.se>

I am not familliar with any other standard library module that requres a non-standard module. I don't believe this is a good idea.
2) Write supporting code in C (basically, copy the required process handling functions from win32all).
Depending on how stable the code has been, this may be the best idea. As long as the maintainer of the popen5 windows support code kept themselves updated on the status of applicable changes to win32all, this should go on without a hitch. It does bring up the fact that doing so would result in a possibly nontrivial amount of code duplication between a new portion of the python standard library, and a platform specific module. - Josiah

At 03-01-2004 19:02, you wrote:
There are some issues wrt Windows support. Currently, popen5 does not support Windows at all. To be able to do so, we must chose between:
When it does support windows please make it work the same on all platforms. The existing popen code for unix is buggy and not compatible with the windows version or the docs.
The win32all extension is a thin wrapper over the windows API. The proposed popen5 code would simply be some windows specific code that calls windows API directly. There is no code in win32all that would be needed to be duplicated as far as I can see. Did I miss something?
- Josiah
Barry

[Barry]
Agreed.
I was not familliar with the structure of win32all. My vote goes for 'add in the required C portions for windows support'. And 'make all versions work the same'. [Peter in regards to asyncore]
Using asyncore in *nix to do file IO doesn't seem to be that bad. I tried to do the same thing on Windows (to get stdin, stdout, and stderr to be non-blocking), but select doesn't work for file handles or pipes in Windows. If my memory serves me correctly, it was fairly trivial. I've actually got a bit of other code for buffered IO on sockets that is an asynchat work-alike, is around 70 lines, and uses asyncore. Using asynchat itself and setting a proper terminator would be easy, less than 20 lines (by my estimation). You include all the logic of asyncore.poll in popen5.Popen.communicate. Using asyncore.poll may be to your benefit. One thing to note is that string concatenation is not necessarily fast. That is: STR += DATA can be slow for initially large STR. Using a list with list.append and ''.join(list) is significantly faster for large buffers. I have been contemplating submitting a patch for asynchat that makes it behave better for large buffers. The only issue that I can see in using asyncore or asynchat is that each of {stdin, stdout, stderr} may need to have their own instance of asyncore.dispatcher or asynchat.async_chat. That would be pretty ugly. I still like the idea of popen5, but I think you may want to at least take a look at asyncore and friends. - Josiah

Peter Åstrand wrote:
This PEP describes a new module for starting and communicating with processes on POSIX systems.
I see many aspects in this PEP that improve the existing implementation without changing the interface. I would suggest that you try to enhance the existing API (making changes to its semantics where reasonable), instead of coming up with a completely new module. With that approach, existing applications could use these features with no or little change.
- One "unified" module provides all functionality from previous functions.
I doubt this is a good thing. Different applications have different needs - having different API for them is reasonable.
This is a bug in popen2, IMO. Fixing it is a good thing, but does not require a new module.
- A hook for executing custom code between fork and exec. This can be used for, for example, changing uid.
Such a hook could be merged as a keyword argument into the existing API.
- No implicit call of /bin/sh. This means that there is no need for escaping dangerous shell meta characters.
This could be an option to the existing API. Make sure it works on all systems, though.
Sounds like a new function on the popen2 module.
This should be an option on the existing API.
- Support for connecting several subprocesses (shell "pipe").
Isn't this available already, as the shell supports pipe creation, anyway?
- Universal newline support.
This should be merged into the existing code.
Isn't asyncore supposed to simplify that? So in short, I'm -1 on creating a new module, but +1 on merging most of these features into the existing code base - they are good features. Regards, Martin

On Sat, 3 Jan 2004, Martin v. Loewis wrote:
I don't agree. I have used all of the existing mechanism in lots of apps, and it's just a pain. There are lots of functions to choose between, but none does what you really want.
"Fixing popen2" would mean a break old applications; exceptions will happen, which apps are not prepared of.
Into which module/method/function? There is no one flexible enough. The case for redirecting only stderr is just one example; this is simple not possible with the current API.
To support all combinations, 12 different functions are necessary. Who will remember what popen2.popen11() means?
With popen5, you can do it *without* using the shell.
- Universal newline support.
This should be merged into the existing code.
There's already a bug about this; bug 788035. This is what one of the comment says: "But this whole popen{,2,3,4} section of posixmodule.c is so fiendishly complicated with all the platform special cases that I'm loath to touch it..." I haven't checked if this is really true, though.
Probably not. The description says: "This module provides the basic infrastructure for writing asynchronous socket service clients and servers." It's not obvious to me how this module could be use as a "shell backquote" replacement (which is what communicate() is about). It's probably possible though; I haven't tried. Even if this is possible I guess we need some kind of "entry" or "wrapper" method in the popen module to simplify things for the user. My guess is that an communicate() method that uses asyncore would be as long/complicated as the current implementation. The current implementation is only 68 lines, including comments.
Well, I don't see how this could be done easily: The current API is not flexible enough, and some things (like cross-process exceptions) breaks compatibility. Writing a good popen module is hard. Providing cross-platform support (for Windows, for example) is even harder. Trying to retrofit a good popen implementation into an old API without breaking compatibility seems impossible to me. I'm not prepared to try. -- /Peter Åstrand <astrand@lysator.liu.se>

Peter Astrand wrote:
So enhance them, instead of replacing them.
I find that an acceptable incompatibility, and it will likely break no existing application. Applications usually expect that the program they start actually exists; it is a good thing that they now can detect the error that the missing/non-executable application. Errors should never pass silently.
For example, popen2.popen2, as argument preexec_fn.
Can you elaborate? What is the specific problem, how does your preexec function look like, and how is it used with popen5. I can then show you how it could be used with popen2, if that was enhanced appropriately.
Why is that? Just add a single function, with arguments stdin/stdout/stderr. No need for 12 functions. Then explain the existing functions in terms of your new function (if possible).
Why is that a good thing?
You really should work with the existing code base. Ignoring it is a guarantee that your PEP will be rejected. (Studying it, and then providing educated comments about it, might get you through) I think this is the core problem of your approach: You throw away all past history, and imply that you can do better than all prior contributors could. Honestly, this is doubtful. The current code is so complicated because implementing pipes is complicated.
I never said it would be easy. However, introducing a new popen module is a major change, and there must be strong indications that the current API cannot be enhanced before throwing it away. There should be one-- and preferably only one --obvious way to do it. As for breaking compatibility: This is what the PEP should study in detail. It is sometimes acceptable to break compatibility, if applications are likely to be improved by the change. *Any* change can, in principle, break compatibility. Suppose I had an application that did from popen5 import open This application might break if your proposed change is implemented, as a new module is added. So you can't claim "I will break no programs".
So I continue to be -1 with your PEP. Regards, Martin

On Sun, 4 Jan 2004, Martin v. Loewis wrote:
Not true. There are lots or apps out there that uses fallback commands: tries to execute one, and if it doesn't exist, tries another one. (One example is jakarta-gump, see http://cvs.apache.org/viewcvs.cgi/jakarta-gump/python/gump/utils/launcher.py?rev=1.6&view=auto) With the current API, you do this by checking if the return code is 127. No-one is prepared for an exception. The return code stuff is also very problematic, and is another reason why make a new module and not "enhance" the old ones. With the current API (which calls /bin/sh most of the time), some returncodes are overloaded by the shell. The shell uses these return codes: 126: the command was found but is not executable 127: the command was not found 128+n: the command was terminated by signal n This means that it is currently impossible to use these return codes for programs launched via the current API, since you cannot tell the difference between a 127 generated by a successful call to your command, and a 127 generated by the shell. I don't see how this can be solved by "enhancing" the current functions, without breaking old applications.
There are lots of other errors as well, not just missing/non-executable programs.
Yes, the preexec function feature could possiby be added popen2. This is not the problem.
Just like popen5.Popen? Yes, that could be done. We would still have the problem with returncode incompatibilites, exceptions and such.
With popen5, you can do it *without* using the shell.
Why is that a good thing?
1) Performance. No need for parsing .bashrc on every call... 2) Security. You can do pipes without having to deal with all the quoting issues. 3) Getting rid of the shells overloading of return codes It's also much more elegant, IMHO.
In a discussion like this, I think it's important to separate the new API from the new implementation: 1) The new API. If you look at the popen5 implementation and PEP, it's obvious that I haven't throwed away the history. I have tried to take all the good parts from the various existing functions. The documentation contains 140 lines describing how to migrate from the earlier functions. Much of the current API has really never been designed. The API for the functions os.popen, os.system, os.popen2 comes from the old POSIX functions. These were never intended to be flexible, cross-platform on anything like that. So, it's not hard to do better than these. 2) The new implementation. When I wrote popen5, I took some good ideas out of popen2. The rest of the code is written from scratch.
The current code is so complicated because implementing pipes is complicated.
Let's keep the POSIX stuff separated from the Windows stuff. popen2.py does not depend on posixmodule.c on POSIX systems, and popen2.py is not complicated at all. The popen* stuff for Windows (and OS2 etc) in posixmodule.c is complicated because: 1) It's written in low-level C 2) It contains lots of old DOS stuff 3) It tries to launch the program through the shell (which is always a pain).
I wouldn't say that introducing a new module is a "major change". Of course, we don't want to end up writing "popen6" in two years, because we've realized that "popen5" is too limited. That's why we should try to get it exactly right this time. I think it would be more useful it we put our energy into trying to accomplish that.
Isn't this quite a silly example? -- /Peter Åstrand <astrand@lysator.liu.se>

Peter, After seeing the discussion that your PEP has sparked (without reading all of it) I am torn. I agree that Python's subprocess management facilities can use a lot of work. I agree that your code is by and large an improvement over popen2 and friends. But I'm not sure that it is *enough* of an improvement to be accepted into the standard library. What I'm looking for in standard library additions is "category-killers". IMO good recent examples are optparse and the logging module. Both were in active use and had independent distributions with happy users before they were accepted into the standard library. Note that in my definition, a category-killer doesn't have to have all conceivable features -- it has to strike the right balance between feature completeness and a set of properties that are usually referred to by words like elegance, simplicity, ease-of-use, ease-of-learning. For example, optparse specifically does not support certain styles of option parsing -- it strives to encourage uniformity in option syntax, and that rules out certain features. (I'm not saying that your popen5 has too many features -- I'm just warning that I'm not asking you to make it a category-killer by adding tons more features. I don't want more features, I want the satisfying feeling that this is the "best" solution, for my very personal definition of "best".) I note that your source code at http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/popen5/?cvsroot=python-popen5 has some useful comments that were missing from the PEP, e.g. how to replace various older APIs with calls to popen5. It also has some motivation for the API choices you made that are missing from the PEP (e.g. why there are options for setting cwd). So what's missing? For the PEP, I would say that a lot of explanatory and motivational text from the source code comments and doc strings should be added. (PEPs don't have to be terse formal documents that describe the feature in as few words as possible; that can be one section of the PEP, but in general a PEP needs to provide motivation and insight as well as specification.) For the code, I think that a Windows version is essential. It would be okay if a small amount of code (e.g. making some basic Windows APIs available, like CreateProcess) had to be coded in C, as long as the bulk could still be in Python. I really like having this kind of code in Python -- it helps understanding the fine details, and it allows subclassing more easily. I also wonder if more support for managing a whole flock of subprocesses might not be a useful addition; this is a common need in some applications. And with this, I'd like to see explicit support (somehow -- maybe through a separate class) for managing "daemon" processes. This certainly is a common need! I would like this to replace all other "high-level" subprocess management facilities, in particular os.system() and os.popen() and the popen2 module, on all platforms. Oh, and also os.spawn*(). But not os.fork() and os.exec*(), since those are the building blocks (on Unix, anyway). --Guido van Rossum (home page: http://www.python.org/~guido/)

This PEP looks fantastic to me; I've often wanted a better set of primitives for working with external processes, and your PEP has just about everything I've ever wanted. I do have some minor nits to pick, though: - The preexec_args argument is extraneous considering how easy it is to use a lambda construct to bind arguments to a function. Consider: Popen(..., preexec_fn=foo, preexec_args=(bar,baz)) vs. Popen(..., preexec_fn=lambda: foo(bar,baz)) - Rather than passing argv0 as a named argument, what about passing in the program to execute instead? It would simplify code like this: Popen(child_program, *child_args[1:], argv0=child_args[0]) into this Popen(*child_args, program=child_program) Of course I'm not suggesting you change the default behavior of taking argv0 *and* the program to execute from the first positional argument. - The defaults for close_fds and universal_newlines should be False rather than 0; this would make it clear from reading the synopsis that these arguments are boolean-valued. - The poll() method and returncode attribute aren't both needed. I like returncode's semantics better since it gives more information and it avoids using -1 as a magic number. - Rather than raising a PopenException for incorrect arguments, I would think that raising ValueError or TypeError would be more in line with users' expectations. - How does this communicate() method interact with I/O redirection? - As you and others have mentioned, this API needs a better name than popen5; I like "process" a lot better. - Would you consider adding an example of how to chain processes together into a pipeline? You say this is possible, and I'm assuming that's what the PIPE constant is for, but I'd like to see it written out to make sure I'm understanding it correctly.

On Sat, 3 Jan 2004, Peter Åstrand wrote:
PEP 324: popen5 - New POSIX process module
A copy is included below. Comments are appreciated.
There are some issues wrt Windows support. Currently, popen5 does not support Windows at all. To be able to do so, we must chose between: 1) Rely on Mark Hammonds Win32 extensions. This makes it possible to keep popen5 as a pure Python module. The drawback is that Windows support won't be available unless the user installs win32all. Does anyone think there's a problem with adding a module to the Python standard library that uses win32all? or 2) Write supporting code in C (basically, copy the required process handling functions from win32all). -- /Peter Åstrand <astrand@lysator.liu.se>

I am not familliar with any other standard library module that requres a non-standard module. I don't believe this is a good idea.
2) Write supporting code in C (basically, copy the required process handling functions from win32all).
Depending on how stable the code has been, this may be the best idea. As long as the maintainer of the popen5 windows support code kept themselves updated on the status of applicable changes to win32all, this should go on without a hitch. It does bring up the fact that doing so would result in a possibly nontrivial amount of code duplication between a new portion of the python standard library, and a platform specific module. - Josiah

At 03-01-2004 19:02, you wrote:
There are some issues wrt Windows support. Currently, popen5 does not support Windows at all. To be able to do so, we must chose between:
When it does support windows please make it work the same on all platforms. The existing popen code for unix is buggy and not compatible with the windows version or the docs.
The win32all extension is a thin wrapper over the windows API. The proposed popen5 code would simply be some windows specific code that calls windows API directly. There is no code in win32all that would be needed to be duplicated as far as I can see. Did I miss something?
- Josiah
Barry

[Barry]
Agreed.
I was not familliar with the structure of win32all. My vote goes for 'add in the required C portions for windows support'. And 'make all versions work the same'. [Peter in regards to asyncore]
Using asyncore in *nix to do file IO doesn't seem to be that bad. I tried to do the same thing on Windows (to get stdin, stdout, and stderr to be non-blocking), but select doesn't work for file handles or pipes in Windows. If my memory serves me correctly, it was fairly trivial. I've actually got a bit of other code for buffered IO on sockets that is an asynchat work-alike, is around 70 lines, and uses asyncore. Using asynchat itself and setting a proper terminator would be easy, less than 20 lines (by my estimation). You include all the logic of asyncore.poll in popen5.Popen.communicate. Using asyncore.poll may be to your benefit. One thing to note is that string concatenation is not necessarily fast. That is: STR += DATA can be slow for initially large STR. Using a list with list.append and ''.join(list) is significantly faster for large buffers. I have been contemplating submitting a patch for asynchat that makes it behave better for large buffers. The only issue that I can see in using asyncore or asynchat is that each of {stdin, stdout, stderr} may need to have their own instance of asyncore.dispatcher or asynchat.async_chat. That would be pretty ugly. I still like the idea of popen5, but I think you may want to at least take a look at asyncore and friends. - Josiah

Peter Åstrand wrote:
This PEP describes a new module for starting and communicating with processes on POSIX systems.
I see many aspects in this PEP that improve the existing implementation without changing the interface. I would suggest that you try to enhance the existing API (making changes to its semantics where reasonable), instead of coming up with a completely new module. With that approach, existing applications could use these features with no or little change.
- One "unified" module provides all functionality from previous functions.
I doubt this is a good thing. Different applications have different needs - having different API for them is reasonable.
This is a bug in popen2, IMO. Fixing it is a good thing, but does not require a new module.
- A hook for executing custom code between fork and exec. This can be used for, for example, changing uid.
Such a hook could be merged as a keyword argument into the existing API.
- No implicit call of /bin/sh. This means that there is no need for escaping dangerous shell meta characters.
This could be an option to the existing API. Make sure it works on all systems, though.
Sounds like a new function on the popen2 module.
This should be an option on the existing API.
- Support for connecting several subprocesses (shell "pipe").
Isn't this available already, as the shell supports pipe creation, anyway?
- Universal newline support.
This should be merged into the existing code.
Isn't asyncore supposed to simplify that? So in short, I'm -1 on creating a new module, but +1 on merging most of these features into the existing code base - they are good features. Regards, Martin

On Sat, 3 Jan 2004, Martin v. Loewis wrote:
I don't agree. I have used all of the existing mechanism in lots of apps, and it's just a pain. There are lots of functions to choose between, but none does what you really want.
"Fixing popen2" would mean a break old applications; exceptions will happen, which apps are not prepared of.
Into which module/method/function? There is no one flexible enough. The case for redirecting only stderr is just one example; this is simple not possible with the current API.
To support all combinations, 12 different functions are necessary. Who will remember what popen2.popen11() means?
With popen5, you can do it *without* using the shell.
- Universal newline support.
This should be merged into the existing code.
There's already a bug about this; bug 788035. This is what one of the comment says: "But this whole popen{,2,3,4} section of posixmodule.c is so fiendishly complicated with all the platform special cases that I'm loath to touch it..." I haven't checked if this is really true, though.
Probably not. The description says: "This module provides the basic infrastructure for writing asynchronous socket service clients and servers." It's not obvious to me how this module could be use as a "shell backquote" replacement (which is what communicate() is about). It's probably possible though; I haven't tried. Even if this is possible I guess we need some kind of "entry" or "wrapper" method in the popen module to simplify things for the user. My guess is that an communicate() method that uses asyncore would be as long/complicated as the current implementation. The current implementation is only 68 lines, including comments.
Well, I don't see how this could be done easily: The current API is not flexible enough, and some things (like cross-process exceptions) breaks compatibility. Writing a good popen module is hard. Providing cross-platform support (for Windows, for example) is even harder. Trying to retrofit a good popen implementation into an old API without breaking compatibility seems impossible to me. I'm not prepared to try. -- /Peter Åstrand <astrand@lysator.liu.se>

Peter Astrand wrote:
So enhance them, instead of replacing them.
I find that an acceptable incompatibility, and it will likely break no existing application. Applications usually expect that the program they start actually exists; it is a good thing that they now can detect the error that the missing/non-executable application. Errors should never pass silently.
For example, popen2.popen2, as argument preexec_fn.
Can you elaborate? What is the specific problem, how does your preexec function look like, and how is it used with popen5. I can then show you how it could be used with popen2, if that was enhanced appropriately.
Why is that? Just add a single function, with arguments stdin/stdout/stderr. No need for 12 functions. Then explain the existing functions in terms of your new function (if possible).
Why is that a good thing?
You really should work with the existing code base. Ignoring it is a guarantee that your PEP will be rejected. (Studying it, and then providing educated comments about it, might get you through) I think this is the core problem of your approach: You throw away all past history, and imply that you can do better than all prior contributors could. Honestly, this is doubtful. The current code is so complicated because implementing pipes is complicated.
I never said it would be easy. However, introducing a new popen module is a major change, and there must be strong indications that the current API cannot be enhanced before throwing it away. There should be one-- and preferably only one --obvious way to do it. As for breaking compatibility: This is what the PEP should study in detail. It is sometimes acceptable to break compatibility, if applications are likely to be improved by the change. *Any* change can, in principle, break compatibility. Suppose I had an application that did from popen5 import open This application might break if your proposed change is implemented, as a new module is added. So you can't claim "I will break no programs".
So I continue to be -1 with your PEP. Regards, Martin

On Sun, 4 Jan 2004, Martin v. Loewis wrote:
Not true. There are lots or apps out there that uses fallback commands: tries to execute one, and if it doesn't exist, tries another one. (One example is jakarta-gump, see http://cvs.apache.org/viewcvs.cgi/jakarta-gump/python/gump/utils/launcher.py?rev=1.6&view=auto) With the current API, you do this by checking if the return code is 127. No-one is prepared for an exception. The return code stuff is also very problematic, and is another reason why make a new module and not "enhance" the old ones. With the current API (which calls /bin/sh most of the time), some returncodes are overloaded by the shell. The shell uses these return codes: 126: the command was found but is not executable 127: the command was not found 128+n: the command was terminated by signal n This means that it is currently impossible to use these return codes for programs launched via the current API, since you cannot tell the difference between a 127 generated by a successful call to your command, and a 127 generated by the shell. I don't see how this can be solved by "enhancing" the current functions, without breaking old applications.
There are lots of other errors as well, not just missing/non-executable programs.
Yes, the preexec function feature could possiby be added popen2. This is not the problem.
Just like popen5.Popen? Yes, that could be done. We would still have the problem with returncode incompatibilites, exceptions and such.
With popen5, you can do it *without* using the shell.
Why is that a good thing?
1) Performance. No need for parsing .bashrc on every call... 2) Security. You can do pipes without having to deal with all the quoting issues. 3) Getting rid of the shells overloading of return codes It's also much more elegant, IMHO.
In a discussion like this, I think it's important to separate the new API from the new implementation: 1) The new API. If you look at the popen5 implementation and PEP, it's obvious that I haven't throwed away the history. I have tried to take all the good parts from the various existing functions. The documentation contains 140 lines describing how to migrate from the earlier functions. Much of the current API has really never been designed. The API for the functions os.popen, os.system, os.popen2 comes from the old POSIX functions. These were never intended to be flexible, cross-platform on anything like that. So, it's not hard to do better than these. 2) The new implementation. When I wrote popen5, I took some good ideas out of popen2. The rest of the code is written from scratch.
The current code is so complicated because implementing pipes is complicated.
Let's keep the POSIX stuff separated from the Windows stuff. popen2.py does not depend on posixmodule.c on POSIX systems, and popen2.py is not complicated at all. The popen* stuff for Windows (and OS2 etc) in posixmodule.c is complicated because: 1) It's written in low-level C 2) It contains lots of old DOS stuff 3) It tries to launch the program through the shell (which is always a pain).
I wouldn't say that introducing a new module is a "major change". Of course, we don't want to end up writing "popen6" in two years, because we've realized that "popen5" is too limited. That's why we should try to get it exactly right this time. I think it would be more useful it we put our energy into trying to accomplish that.
Isn't this quite a silly example? -- /Peter Åstrand <astrand@lysator.liu.se>

Peter, After seeing the discussion that your PEP has sparked (without reading all of it) I am torn. I agree that Python's subprocess management facilities can use a lot of work. I agree that your code is by and large an improvement over popen2 and friends. But I'm not sure that it is *enough* of an improvement to be accepted into the standard library. What I'm looking for in standard library additions is "category-killers". IMO good recent examples are optparse and the logging module. Both were in active use and had independent distributions with happy users before they were accepted into the standard library. Note that in my definition, a category-killer doesn't have to have all conceivable features -- it has to strike the right balance between feature completeness and a set of properties that are usually referred to by words like elegance, simplicity, ease-of-use, ease-of-learning. For example, optparse specifically does not support certain styles of option parsing -- it strives to encourage uniformity in option syntax, and that rules out certain features. (I'm not saying that your popen5 has too many features -- I'm just warning that I'm not asking you to make it a category-killer by adding tons more features. I don't want more features, I want the satisfying feeling that this is the "best" solution, for my very personal definition of "best".) I note that your source code at http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/popen5/?cvsroot=python-popen5 has some useful comments that were missing from the PEP, e.g. how to replace various older APIs with calls to popen5. It also has some motivation for the API choices you made that are missing from the PEP (e.g. why there are options for setting cwd). So what's missing? For the PEP, I would say that a lot of explanatory and motivational text from the source code comments and doc strings should be added. (PEPs don't have to be terse formal documents that describe the feature in as few words as possible; that can be one section of the PEP, but in general a PEP needs to provide motivation and insight as well as specification.) For the code, I think that a Windows version is essential. It would be okay if a small amount of code (e.g. making some basic Windows APIs available, like CreateProcess) had to be coded in C, as long as the bulk could still be in Python. I really like having this kind of code in Python -- it helps understanding the fine details, and it allows subclassing more easily. I also wonder if more support for managing a whole flock of subprocesses might not be a useful addition; this is a common need in some applications. And with this, I'd like to see explicit support (somehow -- maybe through a separate class) for managing "daemon" processes. This certainly is a common need! I would like this to replace all other "high-level" subprocess management facilities, in particular os.system() and os.popen() and the popen2 module, on all platforms. Oh, and also os.spawn*(). But not os.fork() and os.exec*(), since those are the building blocks (on Unix, anyway). --Guido van Rossum (home page: http://www.python.org/~guido/)

This PEP looks fantastic to me; I've often wanted a better set of primitives for working with external processes, and your PEP has just about everything I've ever wanted. I do have some minor nits to pick, though: - The preexec_args argument is extraneous considering how easy it is to use a lambda construct to bind arguments to a function. Consider: Popen(..., preexec_fn=foo, preexec_args=(bar,baz)) vs. Popen(..., preexec_fn=lambda: foo(bar,baz)) - Rather than passing argv0 as a named argument, what about passing in the program to execute instead? It would simplify code like this: Popen(child_program, *child_args[1:], argv0=child_args[0]) into this Popen(*child_args, program=child_program) Of course I'm not suggesting you change the default behavior of taking argv0 *and* the program to execute from the first positional argument. - The defaults for close_fds and universal_newlines should be False rather than 0; this would make it clear from reading the synopsis that these arguments are boolean-valued. - The poll() method and returncode attribute aren't both needed. I like returncode's semantics better since it gives more information and it avoids using -1 as a magic number. - Rather than raising a PopenException for incorrect arguments, I would think that raising ValueError or TypeError would be more in line with users' expectations. - How does this communicate() method interact with I/O redirection? - As you and others have mentioned, this API needs a better name than popen5; I like "process" a lot better. - Would you consider adding an example of how to chain processes together into a pipeline? You say this is possible, and I'm assuming that's what the PIPE constant is for, but I'd like to see it written out to make sure I'm understanding it correctly.

At 04-01-2004 00:14, Martin v. Loewis wrote:
With popen5, you can do it *without* using the shell.
Why is that a good thing?
Because using the shell on windows is causing a DOS box window to appear for every popen2/3/4 use in a windowed python program on Windows. Barry
participants (7)
-
Barry Scott
-
Guido van Rossum
-
John Williams
-
Josiah Carlson
-
Martin v. Loewis
-
Peter Astrand
-
Peter Åstrand