A question about the subprocess implementation

The subprocess.Popen constructor takes stdin, stdout and stderr keyword arguments which are supposed to represent the file handles of the child process. The object also has stdin, stdout and stderr attributes, which one would naively expect to correspond to the passed in values, except where you pass in e.g. subprocess.PIPE (in which case the corresponding attribute would be set to an actual stream or descriptor). However, in common cases, even when keyword arguments are passed in, the corresponding attributes are set to None. The following script import os from subprocess import Popen, PIPE import tempfile cmd = 'ls /tmp'.split() p = Popen(cmd, stdout=open(os.devnull, 'w+b')) print('process output streams: %s, %s' % (p.stdout, p.stderr)) p = Popen(cmd, stdout=tempfile.TemporaryFile()) print('process output streams: %s, %s' % (p.stdout, p.stderr)) prints process output streams: None, None process output streams: None, None under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then the corresponding attribute *is* set: if the last four lines are changed to p = Popen(cmd, stdout=PIPE) print('process output streams: %s, %s' % (p.stdout, p.stderr)) p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE) print('process output streams: %s, %s' % (p.stdout, p.stderr)) then you get process output streams: <open file '<fdopen>', mode 'rb' at 0x2088660>, None process output streams: None, <open file '<fdopen>', mode 'rb' at 0x2088e40> under Python 2.7, and process output streams: <_io.FileIO name=3 mode='rb'>, None process output streams: None, <_io.FileIO name=5 mode='rb'> This seems to me to contradict the principle of least surprise. One would expect, when an file-like object is passed in as a keyword argument, that it be placed in the corresponding attribute. That way, if one wants to do p.stdout.close() (which is necessary in some cases), one doesn't hit an AttributeError because NoneType has no attribute 'close'. This seems like it might be a bug, but if so it does seem rather egregious: can someone tell me if there is a good design reason for the current behaviour? If there isn't one, I'll raise an issue. Regards, Vinay Sajip

On 1/7/2012 4:25 PM, Vinay Sajip wrote:
The behavior matches the doc: Popen.stdin If the stdin argument was PIPE, this attribute is a file object that provides input to the child process. Otherwise, it is None. -- ditto for Popen.stdout, .stderr
I believe you are expected to keep a reference to anything you pass in. pout = open(os.devnull, 'w+b') p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE) The attributes were added for the case when you do not otherwise have access.
This seems like it might be a bug, but if so it does seem rather egregious:
It would be egregious if is were a bug, but it is not.
someone tell me if there is a good design reason for the current behaviour? If there isn't one, I'll raise an issue.
That seems like a possibly reasonable enhancement request. But the counterargument might be that you have to separately keep track of the need to close anyway. Or that you should do things like with open(os.devnull, 'w+b') as pout: p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE) -- Terry Jan Reedy

Terry Reedy <tjreedy <at> udel.edu> writes:
Right, but it's not very helpful, nor especially intuitive. Why does it have to be None in the case where you pass in a file object? Is there some benefit to be gained by doing this? Does something bad happen if you store that file object in proc.stdin / proc.stdout / proc.stderr?
I believe you are expected to keep a reference to anything you pass in.
This can of course be done, but it can make code less clear than it needs to be. For example, if you run a subprocess asynchronously, the code that makes the Popen constructor call can be in a different place to the code that e.g. captures process output after completion. For that code to know how the Popen was constructed seems to make coupling overly strong.
It may be that the close() needs to be called whether you passed PIPE in, or a file-like object - (a) because of the need to receive and handle SIGPIPE in command pipelines, and (b) because it's e.g. set to a pipe you constructed yourself, and you need to close the write end before you can issue an unsized read on the read end. So the close logic would have to do e.g. if proc.stdout is None: proc.stdout.close() else: # pull out the reference from some other place and then close it rather than just proc.stdout.close() It's doable, of course. The with construction you suggested isn't usable in the general case, where the close() code is in a different place from the code which fires off the subprocess. Of course, since the behaviour matches the docs it would be an enhancement request rather than a bug report. I was hoping someone could enlighten me as to the *reason* for the current behaviour ... as it is, subprocess comes in for some stick in the community for being "hard to use" ... Regards, Vinay Sajip

On 2012-01-08 10:48 , Vinay Sajip wrote:
proc.stdin, proc.stdout, and proc.stderr aren't meant to be a reference to the file that got connected to the subprocess' stdin/stdout/stderr. They are meant to be a reference to the OTHER END of the pipe that got connected. When you pass in a normal file object there is no such thing as the OTHER END of that file. The value None reflects this fact, and should continue to do so. -Phil

On Sat, 7 Jan 2012 21:25:37 +0000 (UTC) Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Note that this is documented behavior for these attributes.
Since the only reason they exist is so you can access your end of a pipe, setting them to anything would seem to be a bug. I'd argue that their existence is more a pola violation than them having the value None. But None is easier than a call to hasattr.
You can close the object you passed in if it wasn't PIPE. If you passed in PIPE, the object has to be exposed some way, otherwise you *can't* close it. This did raise one interesting question, which will go to ideas... <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Mike Meyer <mwm <at> mired.org> writes:
I don't follow your reasoning, re. why setting them to a handle used for subprocess output would be a bug - it's logically the same as the PIPE case. For example, I might have a pipe (say, constructed using os.pipe()) whose write end is intended for the subprocess to output to, and whose read end I want to hand off to some other code to read the output from the subprocess. However, if that other code does a read() on that pipe, it will hang until the write handle for the pipe is closed. So, once the subprocess has terminated, I need to close the write handle. The actual reading might be done not in my code but in some client code of my code. While I could use some other place to store it, where's the problem in storing it in proc.stdout or proc.stderr?
Yes, I'm not disputing that I need to keep track of it - just that proc.stdout seems a good place to keep it. That way, the closing code can be de-coupled from the code that sets up the subprocess. A use case for this is when you want the subprocess and the parent to run concurrently/asynchronously, so the proc.wait() and subsequent processing happens at a different time and place to the kick-off. Regards, Vinay Sajip

On Sun, 8 Jan 2012 02:06:33 +0000 (UTC) Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
No, it isn't. In the PIPE case, the value of the attributes isn't otherwise available to the caller. I think you're not following because you're thinking about what you want to do with the attributes:
storing it [the fd] in proc.stdout or proc.stderr?
As opposed to what they're used for, which is communicating the fd's created in the PIPE case to the caller. Would you feel the same way if they were given the more accurate names "pipe_input" and "pipe_output"?
I disagree. Having the proc object keep track of these things for you is making it more complicated (by the admittedly trivial change of assigning those two attributes when they aren't used) so you can make your process creation code less complicated (by the equally trivial change of assigning the values in those two attributes when they are used). Since only the caller knows when this complication is needed, that's the logical place to put it.
That way, the closing code can be de-coupled from the code that sets up the subprocess.
There are other ways to do that. It's still the same tradeoff - you're making the proc code more complicated to make the calling code simpler, even though only the calling code knows if that's needed. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On 1/7/2012 4:25 PM, Vinay Sajip wrote:
The behavior matches the doc: Popen.stdin If the stdin argument was PIPE, this attribute is a file object that provides input to the child process. Otherwise, it is None. -- ditto for Popen.stdout, .stderr
I believe you are expected to keep a reference to anything you pass in. pout = open(os.devnull, 'w+b') p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE) The attributes were added for the case when you do not otherwise have access.
This seems like it might be a bug, but if so it does seem rather egregious:
It would be egregious if is were a bug, but it is not.
someone tell me if there is a good design reason for the current behaviour? If there isn't one, I'll raise an issue.
That seems like a possibly reasonable enhancement request. But the counterargument might be that you have to separately keep track of the need to close anyway. Or that you should do things like with open(os.devnull, 'w+b') as pout: p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE) -- Terry Jan Reedy

Terry Reedy <tjreedy <at> udel.edu> writes:
Right, but it's not very helpful, nor especially intuitive. Why does it have to be None in the case where you pass in a file object? Is there some benefit to be gained by doing this? Does something bad happen if you store that file object in proc.stdin / proc.stdout / proc.stderr?
I believe you are expected to keep a reference to anything you pass in.
This can of course be done, but it can make code less clear than it needs to be. For example, if you run a subprocess asynchronously, the code that makes the Popen constructor call can be in a different place to the code that e.g. captures process output after completion. For that code to know how the Popen was constructed seems to make coupling overly strong.
It may be that the close() needs to be called whether you passed PIPE in, or a file-like object - (a) because of the need to receive and handle SIGPIPE in command pipelines, and (b) because it's e.g. set to a pipe you constructed yourself, and you need to close the write end before you can issue an unsized read on the read end. So the close logic would have to do e.g. if proc.stdout is None: proc.stdout.close() else: # pull out the reference from some other place and then close it rather than just proc.stdout.close() It's doable, of course. The with construction you suggested isn't usable in the general case, where the close() code is in a different place from the code which fires off the subprocess. Of course, since the behaviour matches the docs it would be an enhancement request rather than a bug report. I was hoping someone could enlighten me as to the *reason* for the current behaviour ... as it is, subprocess comes in for some stick in the community for being "hard to use" ... Regards, Vinay Sajip

On 2012-01-08 10:48 , Vinay Sajip wrote:
proc.stdin, proc.stdout, and proc.stderr aren't meant to be a reference to the file that got connected to the subprocess' stdin/stdout/stderr. They are meant to be a reference to the OTHER END of the pipe that got connected. When you pass in a normal file object there is no such thing as the OTHER END of that file. The value None reflects this fact, and should continue to do so. -Phil

On Sat, 7 Jan 2012 21:25:37 +0000 (UTC) Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Note that this is documented behavior for these attributes.
Since the only reason they exist is so you can access your end of a pipe, setting them to anything would seem to be a bug. I'd argue that their existence is more a pola violation than them having the value None. But None is easier than a call to hasattr.
You can close the object you passed in if it wasn't PIPE. If you passed in PIPE, the object has to be exposed some way, otherwise you *can't* close it. This did raise one interesting question, which will go to ideas... <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Mike Meyer <mwm <at> mired.org> writes:
I don't follow your reasoning, re. why setting them to a handle used for subprocess output would be a bug - it's logically the same as the PIPE case. For example, I might have a pipe (say, constructed using os.pipe()) whose write end is intended for the subprocess to output to, and whose read end I want to hand off to some other code to read the output from the subprocess. However, if that other code does a read() on that pipe, it will hang until the write handle for the pipe is closed. So, once the subprocess has terminated, I need to close the write handle. The actual reading might be done not in my code but in some client code of my code. While I could use some other place to store it, where's the problem in storing it in proc.stdout or proc.stderr?
Yes, I'm not disputing that I need to keep track of it - just that proc.stdout seems a good place to keep it. That way, the closing code can be de-coupled from the code that sets up the subprocess. A use case for this is when you want the subprocess and the parent to run concurrently/asynchronously, so the proc.wait() and subsequent processing happens at a different time and place to the kick-off. Regards, Vinay Sajip

On Sun, 8 Jan 2012 02:06:33 +0000 (UTC) Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
No, it isn't. In the PIPE case, the value of the attributes isn't otherwise available to the caller. I think you're not following because you're thinking about what you want to do with the attributes:
storing it [the fd] in proc.stdout or proc.stderr?
As opposed to what they're used for, which is communicating the fd's created in the PIPE case to the caller. Would you feel the same way if they were given the more accurate names "pipe_input" and "pipe_output"?
I disagree. Having the proc object keep track of these things for you is making it more complicated (by the admittedly trivial change of assigning those two attributes when they aren't used) so you can make your process creation code less complicated (by the equally trivial change of assigning the values in those two attributes when they are used). Since only the caller knows when this complication is needed, that's the logical place to put it.
That way, the closing code can be de-coupled from the code that sets up the subprocess.
There are other ways to do that. It's still the same tradeoff - you're making the proc code more complicated to make the calling code simpler, even though only the calling code knows if that's needed. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
participants (5)
-
Daniel Neuhäuser
-
Mike Meyer
-
Phil Vandry
-
Terry Reedy
-
Vinay Sajip