PEP 324: popen5 - New POSIX process module
There's a new PEP available:
PEP 324: popen5 - New POSIX process module
A copy is included below. Comments are appreciated.
----
PEP: 324
Title: popen5 - New POSIX process module
Version: $Revision: 1.4 $
Last-Modified: $Date: 2004/01/03 10:32:53 $
Author: Peter Astrand
On Sat, 3 Jan 2004, Peter Åstrand wrote:
PEP 324: popen5 - New POSIX process module
A copy is included below. Comments are appreciated.
There are some issues wrt Windows support. Currently, popen5 does not
support Windows at all. To be able to do so, we must chose between:
1) Rely on Mark Hammonds Win32 extensions. This makes it possible to keep
popen5 as a pure Python module. The drawback is that Windows support won't
be available unless the user installs win32all. Does anyone think there's
a problem with adding a module to the Python standard library that uses
win32all?
or
2) Write supporting code in C (basically, copy the required process
handling functions from win32all).
--
/Peter Åstrand
There are some issues wrt Windows support. Currently, popen5 does not support Windows at all. To be able to do so, we must chose between:
1) Rely on Mark Hammonds Win32 extensions. This makes it possible to keep popen5 as a pure Python module. The drawback is that Windows support won't be available unless the user installs win32all. Does anyone think there's a problem with adding a module to the Python standard library that uses win32all?
I am not familliar with any other standard library module that requres a non-standard module. I don't believe this is a good idea.
2) Write supporting code in C (basically, copy the required process handling functions from win32all).
Depending on how stable the code has been, this may be the best idea. As long as the maintainer of the popen5 windows support code kept themselves updated on the status of applicable changes to win32all, this should go on without a hitch. It does bring up the fact that doing so would result in a possibly nontrivial amount of code duplication between a new portion of the python standard library, and a platform specific module. - Josiah
At 03-01-2004 19:02, you wrote:
There are some issues wrt Windows support. Currently, popen5 does not support Windows at all. To be able to do so, we must chose between:
When it does support windows please make it work the same on all platforms. The existing popen code for unix is buggy and not compatible with the windows version or the docs.
2) Write supporting code in C (basically, copy the required process handling functions from win32all).
Depending on how stable the code has been, this may be the best idea. As long as the maintainer of the popen5 windows support code kept themselves updated on the status of applicable changes to win32all, this should go on without a hitch.
The win32all extension is a thin wrapper over the windows API. The proposed popen5 code would simply be some windows specific code that calls windows API directly. There is no code in win32all that would be needed to be duplicated as far as I can see. Did I miss something?
- Josiah
Barry
[Barry]
When it does support windows please make it work the same on all platforms. The existing popen code for unix is buggy and not compatible with the windows version or the docs.
Agreed.
The win32all extension is a thin wrapper over the windows API. The proposed popen5 code would simply be some windows specific code that calls windows API directly. There is no code in win32all that would be needed to be duplicated as far as I can see. Did I miss something?
I was not familliar with the structure of win32all. My vote goes for 'add in the required C portions for windows support'. And 'make all versions work the same'. [Peter in regards to asyncore]
Probably not. The description says:
"This module provides the basic infrastructure for writing asynchronous socket service clients and servers."
It's not obvious to me how this module could be use as a "shell backquote" replacement (which is what communicate() is about). It's probably possible though; I haven't tried. Even if this is possible I guess we need some kind of "entry" or "wrapper" method in the popen module to simplify things for the user. My guess is that an communicate() method that uses asyncore would be as long/complicated as the current implementation. The current implementation is only 68 lines, including comments.
Using asyncore in *nix to do file IO doesn't seem to be that bad. I tried to do the same thing on Windows (to get stdin, stdout, and stderr to be non-blocking), but select doesn't work for file handles or pipes in Windows. If my memory serves me correctly, it was fairly trivial. I've actually got a bit of other code for buffered IO on sockets that is an asynchat work-alike, is around 70 lines, and uses asyncore. Using asynchat itself and setting a proper terminator would be easy, less than 20 lines (by my estimation). You include all the logic of asyncore.poll in popen5.Popen.communicate. Using asyncore.poll may be to your benefit. One thing to note is that string concatenation is not necessarily fast. That is: STR += DATA can be slow for initially large STR. Using a list with list.append and ''.join(list) is significantly faster for large buffers. I have been contemplating submitting a patch for asynchat that makes it behave better for large buffers. The only issue that I can see in using asyncore or asynchat is that each of {stdin, stdout, stderr} may need to have their own instance of asyncore.dispatcher or asynchat.async_chat. That would be pretty ugly. I still like the idea of popen5, but I think you may want to at least take a look at asyncore and friends. - Josiah
Peter Åstrand wrote:
This PEP describes a new module for starting and communicating with processes on POSIX systems.
I see many aspects in this PEP that improve the existing implementation without changing the interface. I would suggest that you try to enhance the existing API (making changes to its semantics where reasonable), instead of coming up with a completely new module. With that approach, existing applications could use these features with no or little change.
- One "unified" module provides all functionality from previous functions.
I doubt this is a good thing. Different applications have different needs - having different API for them is reasonable.
- Cross-process exceptions: Exceptions happening in the child before the new process has started to execute are re-raised in the parent. This means that it's easy to handle exec() failures, for example. With popen2, for example, it's impossible to detect if the execution failed.
This is a bug in popen2, IMO. Fixing it is a good thing, but does not require a new module.
- A hook for executing custom code between fork and exec. This can be used for, for example, changing uid.
Such a hook could be merged as a keyword argument into the existing API.
- No implicit call of /bin/sh. This means that there is no need for escaping dangerous shell meta characters.
This could be an option to the existing API. Make sure it works on all systems, though.
- All combinations of file descriptor redirection is possible. For example, the "python-dialog" [2] needs to spawn a process and redirect stderr, but not stdout. This is not possible with current functions, without using temporary files.
Sounds like a new function on the popen2 module.
- With popen5, it's possible to control if all open file descriptors should be closed before the new program is executed.
This should be an option on the existing API.
- Support for connecting several subprocesses (shell "pipe").
Isn't this available already, as the shell supports pipe creation, anyway?
- Universal newline support.
This should be merged into the existing code.
- A communicate() method, which makes it easy to send stdin data and read stdout and stderr data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop.
Isn't asyncore supposed to simplify that? So in short, I'm -1 on creating a new module, but +1 on merging most of these features into the existing code base - they are good features. Regards, Martin
On Sat, 3 Jan 2004, Martin v. Loewis wrote:
- One "unified" module provides all functionality from previous functions.
I doubt this is a good thing. Different applications have different needs - having different API for them is reasonable.
I don't agree. I have used all of the existing mechanism in lots of apps, and it's just a pain. There are lots of functions to choose between, but none does what you really want.
- Cross-process exceptions: Exceptions happening in the child before the new process has started to execute are re-raised in the parent. This means that it's easy to handle exec() failures, for example. With popen2, for example, it's impossible to detect if the execution failed.
This is a bug in popen2, IMO. Fixing it is a good thing, but does not require a new module.
"Fixing popen2" would mean a break old applications; exceptions will happen, which apps are not prepared of.
- A hook for executing custom code between fork and exec. This can be used for, for example, changing uid.
Such a hook could be merged as a keyword argument into the existing API.
Into which module/method/function? There is no one flexible enough. The case for redirecting only stderr is just one example; this is simple not possible with the current API.
- All combinations of file descriptor redirection is possible. For example, the "python-dialog" [2] needs to spawn a process and redirect stderr, but not stdout. This is not possible with current functions, without using temporary files.
Sounds like a new function on the popen2 module.
To support all combinations, 12 different functions are necessary. Who will remember what popen2.popen11() means?
- Support for connecting several subprocesses (shell "pipe").
Isn't this available already, as the shell supports pipe creation, anyway?
With popen5, you can do it *without* using the shell.
- Universal newline support.
This should be merged into the existing code.
There's already a bug about this; bug 788035. This is what one of the comment says: "But this whole popen{,2,3,4} section of posixmodule.c is so fiendishly complicated with all the platform special cases that I'm loath to touch it..." I haven't checked if this is really true, though.
- A communicate() method, which makes it easy to send stdin data and read stdout and stderr data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop.
Isn't asyncore supposed to simplify that?
Probably not. The description says: "This module provides the basic infrastructure for writing asynchronous socket service clients and servers." It's not obvious to me how this module could be use as a "shell backquote" replacement (which is what communicate() is about). It's probably possible though; I haven't tried. Even if this is possible I guess we need some kind of "entry" or "wrapper" method in the popen module to simplify things for the user. My guess is that an communicate() method that uses asyncore would be as long/complicated as the current implementation. The current implementation is only 68 lines, including comments.
So in short, I'm -1 on creating a new module, but +1 on merging most of these features into the existing code base - they are good features.
Well, I don't see how this could be done easily: The current API is not
flexible enough, and some things (like cross-process exceptions) breaks
compatibility.
Writing a good popen module is hard. Providing cross-platform support (for
Windows, for example) is even harder. Trying to retrofit a good popen
implementation into an old API without breaking compatibility seems
impossible to me. I'm not prepared to try.
--
/Peter Åstrand
Peter Astrand wrote:
I don't agree. I have used all of the existing mechanism in lots of apps, and it's just a pain. There are lots of functions to choose between, but none does what you really want.
So enhance them, instead of replacing them.
- Cross-process exceptions: Exceptions happening in the child before the new process has started to execute are re-raised in the parent.
This is a bug in popen2, IMO. Fixing it is a good thing, but does not require a new module.
"Fixing popen2" would mean a break old applications; exceptions will happen, which apps are not prepared of.
I find that an acceptable incompatibility, and it will likely break no existing application. Applications usually expect that the program they start actually exists; it is a good thing that they now can detect the error that the missing/non-executable application. Errors should never pass silently.
- A hook for executing custom code between fork and exec. This can be used for, for example, changing uid.
Such a hook could be merged as a keyword argument into the existing API.
Into which module/method/function?
For example, popen2.popen2, as argument preexec_fn.
There is no one flexible enough. The case for redirecting only stderr is just one example; this is simple not possible with the current API.
Can you elaborate? What is the specific problem, how does your preexec function look like, and how is it used with popen5. I can then show you how it could be used with popen2, if that was enhanced appropriately.
- All combinations of file descriptor redirection is possible. For example, the "python-dialog" [2] needs to spawn a process and redirect stderr, but not stdout. This is not possible with current functions, without using temporary files.
Sounds like a new function on the popen2 module.
To support all combinations, 12 different functions are necessary. Who will remember what popen2.popen11() means?
Why is that? Just add a single function, with arguments stdin/stdout/stderr. No need for 12 functions. Then explain the existing functions in terms of your new function (if possible).
- Support for connecting several subprocesses (shell "pipe").
Isn't this available already, as the shell supports pipe creation, anyway?
With popen5, you can do it *without* using the shell.
Why is that a good thing?
There's already a bug about this; bug 788035. This is what one of the comment says:
"But this whole popen{,2,3,4} section of posixmodule.c is so fiendishly complicated with all the platform special cases that I'm loath to touch it..."
I haven't checked if this is really true, though.
You really should work with the existing code base. Ignoring it is a guarantee that your PEP will be rejected. (Studying it, and then providing educated comments about it, might get you through) I think this is the core problem of your approach: You throw away all past history, and imply that you can do better than all prior contributors could. Honestly, this is doubtful. The current code is so complicated because implementing pipes is complicated.
Well, I don't see how this could be done easily: The current API is not flexible enough, and some things (like cross-process exceptions) breaks compatibility.
I never said it would be easy. However, introducing a new popen module is a major change, and there must be strong indications that the current API cannot be enhanced before throwing it away. There should be one-- and preferably only one --obvious way to do it. As for breaking compatibility: This is what the PEP should study in detail. It is sometimes acceptable to break compatibility, if applications are likely to be improved by the change. *Any* change can, in principle, break compatibility. Suppose I had an application that did from popen5 import open This application might break if your proposed change is implemented, as a new module is added. So you can't claim "I will break no programs".
Writing a good popen module is hard. Providing cross-platform support (for Windows, for example) is even harder. Trying to retrofit a good popen implementation into an old API without breaking compatibility seems impossible to me. I'm not prepared to try.
So I continue to be -1 with your PEP. Regards, Martin
On Sun, 4 Jan 2004, Martin v. Loewis wrote:
"Fixing popen2" would mean a break old applications; exceptions will happen, which apps are not prepared of.
I find that an acceptable incompatibility, and it will likely break no existing application.
Not true. There are lots or apps out there that uses fallback commands: tries to execute one, and if it doesn't exist, tries another one. (One example is jakarta-gump, see http://cvs.apache.org/viewcvs.cgi/jakarta-gump/python/gump/utils/launcher.py?rev=1.6&view=auto) With the current API, you do this by checking if the return code is 127. No-one is prepared for an exception. The return code stuff is also very problematic, and is another reason why make a new module and not "enhance" the old ones. With the current API (which calls /bin/sh most of the time), some returncodes are overloaded by the shell. The shell uses these return codes: 126: the command was found but is not executable 127: the command was not found 128+n: the command was terminated by signal n This means that it is currently impossible to use these return codes for programs launched via the current API, since you cannot tell the difference between a 127 generated by a successful call to your command, and a 127 generated by the shell. I don't see how this can be solved by "enhancing" the current functions, without breaking old applications.
Applications usually expect that the program they start actually exists; it is a good thing that they now can detect the error that the missing/non-executable application.
There are lots of other errors as well, not just missing/non-executable programs.
There is no one flexible enough. The case for redirecting only stderr is just one example; this is simple not possible with the current API.
Can you elaborate? What is the specific problem, how does your preexec function look like, and how is it used with popen5. I can then show you how it could be used with popen2, if that was enhanced appropriately.
Yes, the preexec function feature could possiby be added popen2. This is not the problem.
Sounds like a new function on the popen2 module.
To support all combinations, 12 different functions are necessary. Who will remember what popen2.popen11() means?
Why is that? Just add a single function, with arguments stdin/stdout/stderr. No need for 12 functions. Then explain the existing functions in terms of your new function (if possible).
Just like popen5.Popen? Yes, that could be done. We would still have the problem with returncode incompatibilites, exceptions and such.
With popen5, you can do it *without* using the shell.
Why is that a good thing?
1) Performance. No need for parsing .bashrc on every call... 2) Security. You can do pipes without having to deal with all the quoting issues. 3) Getting rid of the shells overloading of return codes It's also much more elegant, IMHO.
I think this is the core problem of your approach: You throw away all past history, and imply that you can do better than all prior contributors could. Honestly, this is doubtful.
In a discussion like this, I think it's important to separate the new API from the new implementation: 1) The new API. If you look at the popen5 implementation and PEP, it's obvious that I haven't throwed away the history. I have tried to take all the good parts from the various existing functions. The documentation contains 140 lines describing how to migrate from the earlier functions. Much of the current API has really never been designed. The API for the functions os.popen, os.system, os.popen2 comes from the old POSIX functions. These were never intended to be flexible, cross-platform on anything like that. So, it's not hard to do better than these. 2) The new implementation. When I wrote popen5, I took some good ideas out of popen2. The rest of the code is written from scratch.
The current code is so complicated because implementing pipes is complicated.
Let's keep the POSIX stuff separated from the Windows stuff. popen2.py does not depend on posixmodule.c on POSIX systems, and popen2.py is not complicated at all. The popen* stuff for Windows (and OS2 etc) in posixmodule.c is complicated because: 1) It's written in low-level C 2) It contains lots of old DOS stuff 3) It tries to launch the program through the shell (which is always a pain).
Well, I don't see how this could be done easily: The current API is not flexible enough, and some things (like cross-process exceptions) breaks compatibility.
I never said it would be easy. However, introducing a new popen module is a major change, and there must be strong indications that the current API cannot be enhanced before throwing it away.
I wouldn't say that introducing a new module is a "major change". Of course, we don't want to end up writing "popen6" in two years, because we've realized that "popen5" is too limited. That's why we should try to get it exactly right this time. I think it would be more useful it we put our energy into trying to accomplish that.
As for breaking compatibility: This is what the PEP should study in detail. It is sometimes acceptable to break compatibility, if applications are likely to be improved by the change. *Any* change can, in principle, break compatibility. Suppose I had an application that did
from popen5 import open
This application might break if your proposed change is implemented, as a new module is added. So you can't claim "I will break no programs".
Isn't this quite a silly example?
--
/Peter Åstrand
Peter Astrand wrote:
The popen* stuff for Windows (and OS2 etc) in posixmodule.c is complicated because:
1) It's written in low-level C
2) It contains lots of old DOS stuff
Such code could be eliminated now; DOS is not supported anymore. Would you like to contribute a patch in that direction? Regards, Martin
Peter, After seeing the discussion that your PEP has sparked (without reading all of it) I am torn. I agree that Python's subprocess management facilities can use a lot of work. I agree that your code is by and large an improvement over popen2 and friends. But I'm not sure that it is *enough* of an improvement to be accepted into the standard library. What I'm looking for in standard library additions is "category-killers". IMO good recent examples are optparse and the logging module. Both were in active use and had independent distributions with happy users before they were accepted into the standard library. Note that in my definition, a category-killer doesn't have to have all conceivable features -- it has to strike the right balance between feature completeness and a set of properties that are usually referred to by words like elegance, simplicity, ease-of-use, ease-of-learning. For example, optparse specifically does not support certain styles of option parsing -- it strives to encourage uniformity in option syntax, and that rules out certain features. (I'm not saying that your popen5 has too many features -- I'm just warning that I'm not asking you to make it a category-killer by adding tons more features. I don't want more features, I want the satisfying feeling that this is the "best" solution, for my very personal definition of "best".) I note that your source code at http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/popen5/?cvsroot=python-popen5 has some useful comments that were missing from the PEP, e.g. how to replace various older APIs with calls to popen5. It also has some motivation for the API choices you made that are missing from the PEP (e.g. why there are options for setting cwd). So what's missing? For the PEP, I would say that a lot of explanatory and motivational text from the source code comments and doc strings should be added. (PEPs don't have to be terse formal documents that describe the feature in as few words as possible; that can be one section of the PEP, but in general a PEP needs to provide motivation and insight as well as specification.) For the code, I think that a Windows version is essential. It would be okay if a small amount of code (e.g. making some basic Windows APIs available, like CreateProcess) had to be coded in C, as long as the bulk could still be in Python. I really like having this kind of code in Python -- it helps understanding the fine details, and it allows subclassing more easily. I also wonder if more support for managing a whole flock of subprocesses might not be a useful addition; this is a common need in some applications. And with this, I'd like to see explicit support (somehow -- maybe through a separate class) for managing "daemon" processes. This certainly is a common need! I would like this to replace all other "high-level" subprocess management facilities, in particular os.system() and os.popen() and the popen2 module, on all platforms. Oh, and also os.spawn*(). But not os.fork() and os.exec*(), since those are the building blocks (on Unix, anyway). --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, 5 Jan 2004, Guido van Rossum wrote:
What I'm looking for in standard library additions is "category-killers". IMO good recent examples are optparse and the logging module. Both were in active use and had independent distributions with happy users before they were accepted into the standard library.
Thanks, this is very useful feedback.
For the code, I think that a Windows version is essential. It would be okay if a small amount of code (e.g. making some basic Windows APIs available, like CreateProcess) had to be coded in C, as long as the bulk could still be in Python. I really like having this kind of code in Python -- it helps understanding the fine details, and it allows subclassing more easily.
Sounds good to me.
I also wonder if more support for managing a whole flock of subprocesses might not be a useful addition; this is a common need in some applications. And with this, I'd like to see explicit support (somehow -- maybe through a separate class) for managing "daemon" processes. This certainly is a common need!
I'll put this on the TODO list.
--
/Peter Åstrand
This PEP looks fantastic to me; I've often wanted a better set of primitives for working with external processes, and your PEP has just about everything I've ever wanted. I do have some minor nits to pick, though: - The preexec_args argument is extraneous considering how easy it is to use a lambda construct to bind arguments to a function. Consider: Popen(..., preexec_fn=foo, preexec_args=(bar,baz)) vs. Popen(..., preexec_fn=lambda: foo(bar,baz)) - Rather than passing argv0 as a named argument, what about passing in the program to execute instead? It would simplify code like this: Popen(child_program, *child_args[1:], argv0=child_args[0]) into this Popen(*child_args, program=child_program) Of course I'm not suggesting you change the default behavior of taking argv0 *and* the program to execute from the first positional argument. - The defaults for close_fds and universal_newlines should be False rather than 0; this would make it clear from reading the synopsis that these arguments are boolean-valued. - The poll() method and returncode attribute aren't both needed. I like returncode's semantics better since it gives more information and it avoids using -1 as a magic number. - Rather than raising a PopenException for incorrect arguments, I would think that raising ValueError or TypeError would be more in line with users' expectations. - How does this communicate() method interact with I/O redirection? - As you and others have mentioned, this API needs a better name than popen5; I like "process" a lot better. - Would you consider adding an example of how to chain processes together into a pipeline? You say this is possible, and I'm assuming that's what the PIPE constant is for, but I'd like to see it written out to make sure I'm understanding it correctly.
At 04-01-2004 00:14, Martin v. Loewis wrote:
With popen5, you can do it *without* using the shell.
Why is that a good thing?
Because using the shell on windows is causing a DOS box window to appear for every popen2/3/4 use in a windowed python program on Windows. Barry
participants (7)
-
Barry Scott
-
Guido van Rossum
-
John Williams
-
Josiah Carlson
-
Martin v. Loewis
-
Peter Astrand
-
Peter Åstrand