[ python-Bugs-1634739 ] Problem running a subprocess

SourceForge.net noreply at sourceforge.net
Sun Jan 21 21:22:06 CET 2007


Bugs item #1634739, was opened at 2007-01-13 16:46
Message generated for change (Comment added) made by astrand
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1634739&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Closed
Resolution: Invalid
Priority: 5
Private: No
Submitted By: Florent Rougon (frougon)
Assigned to: Peter Åstrand (astrand)
Summary: Problem running a subprocess

Initial Comment:
Hello,

I have a problem running a subprocess from Python (see below). I first ran into it with the subprocess module, but it's also triggered by a simple os.fork() followed by os.execvp().

So, what is the problem, exactly? I have written the exact same minimal program in C and in Python, which uses fork() and execvp() in the most straightforward way to run the following command:

  transcode -i /tmp/file.mpg -c 100-101 -o snapshot -y im,null -F png

(whose effect is to extract the 100th frame of /tmp/file.mpg and store it into snapshot.png)

The C program runs fast with no error, while the one in Python takes from 60 to 145 times longer (!), and triggers error messages from transcode. This shouldn't happen, since both programs are merely calling transcode in the same way to perform the exact same thing.

Experiments
------------

1. First, I run the C program (extract_frame) twice on a first .mpg file (MPEG 2 PS) [the first time fills the block IO cache], and store the output in extract_frame.output:

  % time ./extract_frame >extract_frame.output 2>&1
  ./extract_frame > extract_frame.output 2>& 1  0.82s user 0.33s system 53% cpu 2.175 total
  % time ./extract_frame >extract_frame.output 2>&1
  ./extract_frame > extract_frame.output 2>& 1  0.79s user 0.29s system 96% cpu 1.118 total

Basically, this takes 1 or 2 seconds. extract_frame.output is attached.

Second, I run the Python program (extract_frame.py) on the same .mpg file, and store the output in extract_frame.py.output:

  % time ./extract_frame.py >extract_frame.py.output 2>&1 
  ./extract_frame.py > extract_frame.py.output 2>& 1  81.59s user 25.98s system 66% cpu 2:42.51 total

This takes more than 2 *minutes*, not seconds!
(of course, the system is idle for all tests)

In extract_frame.py.output, the following error message appears quickly after the process is started:

  failed to write Y plane of frame(demuxer.c) write program stream packet: Broken pipe

which is in fact composed of two error messages, the second one starting at "(demuxer.c)".

Once these messages are printed, the transcode subprocesses[1] seem to hang (with relatively high CPU usage), but eventually complete, after 2 minutes or so.

There are no such error messages in extract_frame.output.

2. Same test with another .mpg file. As far as time is concerned, we have the same problem:

  [C program]
  % time ./extract_frame >extract_frame.output2 2>&1     
  ./extract_frame > extract_frame.output2 2>& 1  0.73s user 0.28s system 43% cpu 2.311 total

  [Python program]
  % time ./extract_frame.py >extract_frame.py.output2 2>&1
  ./extract_frame.py > extract_frame.py.output2 2>& 1  92.84s user 12.20s system 76% cpu 2:18.14 total

We also get the first error message in extract_frame.py.output2:

  failed to write Y plane of frame

when running extract_frame.py, but this time, we do *not* have the second error message:

  (demuxer.c) write program stream packet: Broken pipe

All this is reproducible with Python 2.3, 2.4 and 2.5 (Debian packages in sarge for 2.3 and 2.4, vanilla Python 2.5).

  % python2.5 -c 'import sys; print sys.version; print "%x" % sys.hexversion'
  2.5 (r25:51908, Jan  5 2007, 17:35:09) 
  [GCC 3.3.5 (Debian 1:3.3.5-13)]
  20500f0

  % transcode --version
  transcode v1.0.2 (C) 2001-2003 Thomas Oestreich, 2003-2004 T. Bitterberg

I'd hazard that Python is tweaking some process or threading parameter that is inherited by subprocesses and disturbs transcode, which doesn't happen when calling fork() and execvp() from a C program, but am unfortunately unable to precisely diagnose the problem.

Many thanks for considering.

Regards,

Florent

  [1] Plural because the transcode process spawns several childs: tcextract, tcdemux, etc.

----------------------------------------------------------------------

>Comment By: Peter Åstrand (astrand)
Date: 2007-01-21 21:22

Message:
Logged In: YES 
user_id=344921
Originator: NO

>I never wrote that it was the listing of /proc/<pid>/fd that was
changing
>the behavior of transcode. Please don't put words in my mouth. I wrote
that
>some fds are open soon after the transcode process is started, and
quickly
>closed afterwards, when run from the Python test script.

Sorry about that, I did misunderstand you. 

>I wrote and attached minimal example programs that reproduce the bug.

The problem is that although extract_frame.py is short and "minimal", it
relies on the transcode program, which is a very complex piece of
software. I cannot reproduce your problem on my machine (but I haven't
tried very hard). 

>I wrote and attached minimal example programs that reproduce the bug.
>These programs show that, with a particular program (transcode), a
simple
>fork() + execvp() works fine in C but does not work in Python. *That* is
a
>problem for the Python Library.

Again, there's no clear evidence that there's actually some problem with
the Python library or even Python itself. Python is a very mature
software. The fork() and execvp() are very heavily used. It's unlikely
that there's something fundamentally wrong with them - if so, problems
would probably have turned up for other subprocesses as well, not just
transcode. It might be some minor difference between your C program and
Python. But: It's your job to point out what you think it's wrong. 

>Transcode works perfectly fine when launched by my shell or the test
program in C that I attached to this bug report.

This doesn't prove that Python is guilty. This only proves that you have
failed to reproduce the problem from your C test program and the shell. It
might still be a bug in transcode that's only triggered by some corner case
or timing issue. Say, a race condition. 


>Of course it's a problem in subprocess, since it's not able to run a
>program the same way as a simple fork() + execvp() does in C.

>From what you have described, the program is executed exactly in the same
way from Python as from your C program (wrt open file descriptors after
launch, program arguments etc). 

>Otherwise, please stop wasting my time and rendering the Python BTS
useless.

The problem with me wasting your time can easily be solved, I'll just stop
trying to help you...


----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-21 17:24

Message:
Logged In: YES 
user_id=310088
Originator: YES

I never wrote that it was the listing of /proc/<pid>/fd that was changing
the behavior of transcode. Please don't put words in my mouth. I wrote
that some fds are open soon after the transcode process is started, and
quickly closed afterwards, when run from the Python test script.

The rest of your answer again shows that you didn't read the bug report.
I'll repeat a last time. The title of this bug report is "Problem running
a subprocess". It is *not* "Problem with subprocess.py", although it of
course happens with subprocess.py, since this module relies (on POSIX
operating systems) on os.fork() and os.exec*().

Yes, I could have reported the problem against subprocess.py, since the
problem does exist there. But I tried to be a good citizen and do my part
of the job---to the point where I wasn't able to go further. I figured out
the problem existed in the basic building blocks of subprocess.py, i.e.
os.fork() and os.exec*(), and thus spared you the time needed to find this
out on your own.

I wrote and attached minimal example programs that reproduce the bug.
These programs show that, with a particular program (transcode), a simple
fork() + execvp() works fine in C but does not work in Python. *That* is a
problem for the Python Library.

Finally, to justify the closing of this bug, you wrote:

  "After all, it's transcode that runs slowly and gives errors. This
suggests that the problem is actually in transcode rather than in the
Python subprocess module."

No, no, no. Transcode works perfectly fine when launched by my shell or
the test program in C that I attached to this bug report. It is when run
from Python that the aforementioned problems happen.

You also wrote in the end:

  "Please re-open this bug if you find any indication of that it's
actually subprocess that does something wrong."

Of course it's a problem in subprocess, since it's not able to run a
program the same way as a simple fork() + execvp() does in C... because of
the underlying machinery (fork() and execvp() wrappers, particular settings
of the Python process---I don't know), not because of subprocess.py,
AFAICT. Hence the title I used for the report.

But I won't reopen the bug myself: there is no point in doing so if you
don't read the initial report nor the comments added later. If you're
willing to read the reports you're claiming to take care of, reopen it
yourself. Otherwise, please stop wasting my time and rendering the Python
BTS useless.

----------------------------------------------------------------------

Comment By: Peter Åstrand (astrand)
Date: 2007-01-21 16:37

Message:
Logged In: YES 
user_id=344921
Originator: NO

>That's the only thing I managed to get with the C version. But with the
>Python version, if I don't list the contents of /proc/<pid>/fd
immediately
>after the transcode process started,

I find it very hard to believe that just listing the contents of a
kernel-virtual directory can change the behaviour of an application. I
think it's much more likely that you have a timing issue.  

Since nothing indicates that there's actually a problem with the
subprocess module, I'm closing this bug for now. After all, it's transcode
that runs slowly and gives errors. This suggests that the problem is
actually in transcode rather than in the Python subprocess module. 

Please re-open this bug if you find any indication of that it's actually
subprocess that does something wrong. 

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-14 00:37

Message:
Logged In: YES 
user_id=310088
Originator: YES

Hi Peter,

At the very beginning, it seems the fds are the same in the child
processes running transcode in each implementation (C, Python).

With the C version, I got:

total 5
dr-x------  2 flo users  0 2007-01-14 00:12 .
dr-xr-xr-x  4 flo users  0 2007-01-14 00:12 ..
lrwx------  1 flo users 64 2007-01-14 00:12 0 -> /dev/pts/0
l-wx------  1 flo users 64 2007-01-14 00:12 1 ->
/home/flo/tmp/transcode-test/extract_frame.output
l-wx------  1 flo users 64 2007-01-14 00:12 2 ->
/home/flo/tmp/transcode-test/extract_frame.output
lr-x------  1 flo users 64 2007-01-14 00:12 3 -> pipe:[41339]
lr-x------  1 flo users 64 2007-01-14 00:12 4 -> pipe:[41340]

With the Python version, I got:

total 5
dr-x------  2 flo users  0 2007-01-14 00:05 .
dr-xr-xr-x  4 flo users  0 2007-01-14 00:05 ..
lrwx------  1 flo users 64 2007-01-14 00:05 0 -> /dev/pts/0
l-wx------  1 flo users 64 2007-01-14 00:05 1 ->
/home/flo/tmp/transcode-test/extract_frame.py.output
l-wx------  1 flo users 64 2007-01-14 00:05 2 ->
/home/flo/tmp/transcode-test/extract_frame.py.output
lr-x------  1 flo users 64 2007-01-14 00:05 3 -> pipe:[40641]
lr-x------  1 flo users 64 2007-01-14 00:05 4 -> pipe:[40642]

That's the only thing I managed to get with the C version. But with the
Python version, if I don't list the contents of /proc/<pid>/fd immediately
after the transcode process started, I get this instead:

total 3
dr-x------  2 flo users  0 2007-01-14 00:05 .
dr-xr-xr-x  4 flo users  0 2007-01-14 00:05 ..
lrwx------  1 flo users 64 2007-01-14 00:05 0 -> /dev/pts/0
l-wx------  1 flo users 64 2007-01-14 00:05 1 ->
/home/flo/tmp/transcode-test/extract_frame.py.output
l-wx------  1 flo users 64 2007-01-14 00:05 2 ->
/home/flo/tmp/transcode-test/extract_frame.py.output

No pipes anymore. Only the 3 standard fds.

Note: I performed these tests with the .mpg file that does *not* cause the
"Broken pipe" message to appear; therefore, the broken pipe in question is
probably unrelated to those we saw disappear in this experiment (transcode
launches several processes such as tcdecode, tcextract, etc. all
communicating via pipes; I suppose the "Broken pipe" message shows up when
one of these programs fails, for reasons we have yet to discover).

Regarding your mentioning of close_fds, if I am not mistaken, it's only an
optional argument of subrocess.Popen(). I did try to set it to True when
first running into the problem, and it didn't help. But now, I am using
basic fork() and execvp() (see the attachments), so there is no such
close_fds option, right?

Thanks.

Florent

----------------------------------------------------------------------

Comment By: Peter Åstrand (astrand)
Date: 2007-01-13 23:14

Message:
Logged In: YES 
user_id=344921
Originator: NO

The first thing to check is if the subprocesses have different sets up
file descriptors when you launch them from Python and C, respectively. On
Linux, do /proc/$thepid/fd in both cases and compare the output.  Does it
matter if you use close_fds=1?

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-13 16:52

Message:
Logged In: YES 
user_id=310088
Originator: YES

File Added: extract_frame.py.output2

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-13 16:52

Message:
Logged In: YES 
user_id=310088
Originator: YES

File Added: extract_frame.py.output

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-13 16:51

Message:
Logged In: YES 
user_id=310088
Originator: YES

File Added: extract_frame.output2

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-13 16:50

Message:
Logged In: YES 
user_id=310088
Originator: YES

File Added: extract_frame.output

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-13 16:49

Message:
Logged In: YES 
user_id=310088
Originator: YES

File Added: extract_frame.py

----------------------------------------------------------------------

Comment By: Florent Rougon (frougon)
Date: 2007-01-13 16:49

Message:
Logged In: YES 
user_id=310088
Originator: YES

File Added: Makefile

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1634739&group_id=5470


More information about the Python-bugs-list mailing list