what happens to Popen()'s parent-side file descriptors?

Chris Torek nospam at torek.net
Thu Oct 14 13:12:30 EDT 2010


In article <8bec27dd-b1da-4aa3-81e8-9665db04047b at n40g2000vbb.googlegroups.com>
>'Nobody' (clearly a misnomer!) and Chris, thanks for your excellent
>explanations about garbage collection. (Chris, I believe you must have
>spent more time looking at the subprocess source and writing your
>response than I have spent writing my code.)

Well, I just spent a lot of time looking at the code earlier
this week as I was thinking about using it in a program that is
required to be "highly reliable" (i.e., to never lose data, even
if Things Go Wrong, like disks get full and sub-commands fail).

(Depending on shell version, "set -o pipefail" can allow
"cheating" here, i.e., with subprocess, using shell=True and
commands that have the form "a | b":

    $ (exit 0) | (exit 2) | (exit 0)
    $ echo $?
    0
    $ set -o pipefail
    $ (exit 0) | (exit 2) | (exit 0)
    $ echo $?
    2

but -o pipefail is not POSIX and I am not sure I can count on
it.)

>GC is clearly at the heart of my lack of understanding on this
>point. It sounds like, from what Chris said, that *any* file
>descriptor would be closed when GC occurs if it is no longer
>referenced, subprocess-related or not.

Yes -- but, as noted elsethread, "delayed" failures from events
like "disk is full, can't write last bits of data" become problematic.

>It sounds to me that, although my code might be safe now as is, I
>probably need to do an explicit p.stdXXX.close() myself for any pipes
>which I open via Popen() as soon as I am done with them.

Or, use the p.communicate() function, which contains the explicit
close.  Note that if you are using a unidirectional pipe and do
your own I/O -- as in your example -- calling p.communicate()
will just do the one attempt to read from the pipe and then close
it, so you can ignore the result:

    import subprocess
    p = subprocess.Popen(["cat", "/etc/motd"], stdout=subprocess.PIPE)
    for line in p.stdout:
        print line.rstrip()
    p.communicate()

The last call returns ('', None) (note: not ('', '') as I suggested
earlier, I actually typed this one in on the command line).  Run
python with strace and you can observe the close call happen --
this is the [edited to fit] output after entering the p.communicate()
line:

    read(0, "\r", 1)                        = 1
    write(1, "\n", 1
    )                       = 1
    rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0
    ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost ...}) = 0
    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0

[I push "enter", readline echos a newline and does tty ioctl()s]

    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    rt_sigaction(SIGWINCH, {SIG_DFL}, {0xb759ed10, [], SA_RESTART}, 8) = 0
    time(NULL)                              = 1287075471

[no idea what these are really for, but the signal manipulation
appears to be readline()]

    fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
    _llseek(3, 0, 0xbf80d490, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
    read(3, "", 8192)                       = 0
    close(3)                                = 0

[fd 3 is the pipe reading from "cat /etc/motd" -- no idea what the
fstat64() and _llseek() are for here, but the read() and close() are
from the communicate() function]

    waitpid(13775, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 13775

[this is from p.wait()]

    write(1, "(\'\', None)\n", 11('', None)
    )          = 11

[this is the result being printed, and the rest is presumably
readline() again]

    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0
    ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0
    rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
    ioctl(0, TIOCGWINSZ, {ws_row=44, ws_col=80, ...}) = 0
    ioctl(0, TIOCSWINSZ, {ws_row=44, ws_col=80, ...}) = 0
    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0
    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0
    ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost ...}) = 0
    ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost ...}) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    rt_sigaction(SIGWINCH, {0xb759ed10, [], SA_RESTART}, {SIG_DFL}, 8) = 0
    write(1, ">>> ", 4>>> )                     = 4
    select(1, [0], NULL, NULL, NULL

>On a related point here, I have one case where I need to replace the
>shell construct
>
>   externalprog <somefile >otherfile
>
>I suppose I could just use os.system() here but I'd rather keep the
>Unix shell completely out of the picture (which is why I am moving
>things to Python to begin with!), so I'm just doing a simple open() on
>somefile and otherfile and then passing those file handles into
>Popen() for stdin and stdout. I am already closing those open()ed file
>handles after the child completes, but I suppose that I probably
>should also explicitly close Popen's p.stdin and p.stdout, too. (I'm
>guessing they might be dup()ed from the original file handles?)

There is no dup()ing going on so this is not necessary, but again,
using the communicate function will close them for you.  In this
case, though, I am not entirely sure subprocess is the right hammer
-- it mostly will give you portablility to Windows (well, plus the
magic for preexec_fn and reporting exec failure).

Once again, peeking at the source is the trick :-) ... the arguments
you provide for stdin, stdout, and stderr are used thus:

            if stdin is None:
                pass
            elif stdin == PIPE:
                p2cread, p2cwrite = os.pipe()
            elif isinstance(stdin, int):
                p2cread = stdin
            else:
                # Assuming file-like object
                p2cread = stdin.fileno()

(this is repeated for stdout and stderr) and the resulting
integer file descriptors (or None if not applicable) are
passed to os.fdopen() on the parent side.

(On the child side, the code does the usual shell-like dance
to move the appropriate descriptors to 0 through 2.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list