[Tutor] Newbie Wondering About Threads

Sun Dec 7 06:33:15 CET 2008

Damon Timm wrote:
> On Sat, Dec 6, 2008 at 6:25 PM, Python Nutter <pythonnutter at gmail.com> wrote:
>> I'm on my phone so excuse the simple reply.
>> From what I skimmed you are wrapping shell commands which is what I do
>> all the time. Some hints. 1) look into popen or subprocess in place of
>> execute for more flexibility. I use popen a lot and assigning a popen
>> call to an object name let's you parse the output and make informed
>> decisions depending on what the shell program outputs.
> 
> So I took a peak at subprocess.Popen --> looks like that's the
> direction I would be headed for parallel processes ... a real simple
> way to see it work for me was:
> 
> p2 = subprocess.Popen(["lame","--silent","test.wav","test.mp3"])
> p3 = subprocess.Popen(["lame","--silent","test2.wav","test2.mp3"])
> p2.wait()
> p3.wait()
> 
> top showed that both cores get busy and it takes half the time!  So
> that's great -- when I tried to add the flac decoding through stdout I
> was able to accomplish it as well ... I was mimicing the command of
> "flac --decode --stdout test.flac | lame - test.mp3" ... see:
> 
> p = subprocess.Popen(["flac","--decode","--stdout","test.flac"],
> stdout=subprocess.PIPE)
> p2 = subprocess.Popen(["lame","-","test.mp3"], stdin=subprocess.PIPE)
> p2.communicate(p.communicate()[0])
> 
> That did the trick - it worked!  However, it was *very* slow!  The
> python script has a "real" time of 2m22.504s whereas if I run it from
> the command line it is only 0m18.594s.  Not sure why this is ...

I'm not certain this completely explains the poor performance, if at
all, but the communicate method of Popen objects will wait until EOF is
reached and the process ends. So IIUC, in your example the process 'p'
runs to completion and only then is its stdout (p.communicate()[0])
passed to stdin of 'p2' by the outer communicate call.

You might try something like this (untested!) ...

p1 = subprocess.Popen(
    ["flac","--decode","--stdout","test.flac"],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
p2 = subprocess.Popen(
    ["lame","-","test.mp3"], stdin=p1.stdout, # <--
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
p2.communicate()

... where you directly assign the stdin of 'p2' to be the stdout of 'p1'.

> 
> The last piece of my puzzle though, I am having trouble wrapping my
> head around ... I will have a list of files
> ["file1.flac","file2.flac","file3.flac","etc"] and I want the program
> to tackle compressing two at a time ... but not more than two at a
> time (or four, or eight, or whatever) because that's not going to help
> me at all (I have dual cores right now) ... I am having trouble
> thinking how I can create the algorithm that would do this for me ...

Interesting problem, and not an easy one IMHO, unless you're content
with waiting for a pair of processes to complete before starting two
more. In which case you can just grab two filenames at a time from the
list, define the Popen calls, and wait for (or communicate with) both
before continuing with another pair.

But since you probably want your script to stay busy, and it's
reasonable to assume (I think!) that one of the processes may finish
much sooner or much later than the other... well, it is a bit tricky
(for me, anyway).

Here is my simplistic, not-very-well-thought-out, attempt in
pseudo-code, perhaps it will get you started ...

paths = ["file1.flac","file2.flac", ... "file11.flac"]
procs = []
while paths or procs:
    procs = [p for p in procs if p.poll() is None]
    while paths and len(procs) < 2:
        flac = paths.pop(0)
        procs.append(Popen(['...', flac], ...))
    time.sleep(1)

The idea here is to keep track of running processes in a list, remove
them when they've terminated, and start (append) new processes as
necessary up to the desired max, only while there are files remaining or
processes running.

HTH,
Marty