Newbie Threads question

Sun May 2 12:10:00 EDT 1999

[mbf2y at my-dejanews.com]
> ...
> What I'd like to do is learn how to use threads by "parallelizing" a
> simple set of now currently serialized tasks.
> ...
> hypothetically my code looks like this:
>
> def func1(param):
>    result = []
>    # do a bunch of work, takes about 3 seconds of wall-clock time
>    return result

When a thread function returns, the thread dies, and the return value is
tossed into the bit bucket.  You'll need to stuff the result away in a
non-local vrbl of some kind.

> def func2(param):
>    result = []
>    # do a bunch of work, takes about 5 seconds of wall-clock time
>    return result

Ditto.

> def func3(list1, list2):
>    result = []
>    # does work on the two lists passed in, takes 0.5 secs of
> wall-clock time
>    return result
>
>
> #my main loop simplified
> list1 = func1(x)
> list2 = func2(y)
> reallist = func3(list1,list2)
> #do something with reallist.
>
> Since func1 and func2 are completely independent - IE do not use
> any of the same resources, to me this would be a great place to do
> the work in parallel.

Yup!  In the biz, this is what's called "embarrassingly parallel".  I worked
for at least one now-defunct startup that tried to get rich off non-problems
exactly like that <wink>.

>  What I want to do is something like this:
>
> #my new main loop
> list1 = thread.start_new_thread(func1,(x))
> list2 = thread.start_new_thread(func2,(y))
> reallist = func3(list1,list2)
>
> I have read the documentation a couple of times, but what I don't know:
>
> A) Is the return value of start_new_thread the same as the return
> value of the function it calls?

Expanding on Gordon's hint, if start_new_thread waited for func1 to return a
value, nothing at all would happen in parallel (the second call to
start_new_thread couldn't begin before func1 returned).

Note too that you need to pass a tuple of arguments in the call, and (x)
isn't a tuple.  A 1-tuple is a degenerate case that needs to be spelled (x,)
(note the silly-looking trailing comma there).

> B) Do I need to do anything fancy to make the func3 call wait
> until the calls to func1 and func2 have both returned?

Absolutely.

> For example, should I have the func1 and func2 calls aquire a lock at
> the start of the function and release it at the end, and then make the
> main loop try to acquire both locks before proceeding to the
> reallist = func3() call?

What you "should do" is use the higher-level threading module's "join"
method.  Rolling your own is fraught with peril.  For example, consider your
suggestion:

def func1(...):
    acquire lock1
    do work
    release lock1

# func2 similarly, but with lock2

# main loop
    start func1
    start func2
    acquire lock1
    acquire lock2

It's quite possible that the main thread will start func1 and func2, and
acquire both lock1 and lock2 before any code in func1 or func2 gets a chance
to execute.  Then func1 and func2 hang waiting to acquire locks that will
never get released, and your main loop hangs too on the next trip around.

The kind of gimmick you're thinking of *can* work, but requires acquiring
the locks in the main loop *before* starting the threads; then it's
guaranteed that the thread function is entered with its lock in the acquired
state:

def func1(...):
    do work
    release lock1

# func2 similarly, but with lock2

acquire lock1
acquire lock2
# main loop
    start func1
    start func2
    acquire lock1
    acquire lock2

I'll attach a less painful alternative using the "threading" module.
"threading" presents a scheme more-or-less like Java's thread API, so
getting a book on Java threads would be a good idea.

threads-tend-to-unravel-ly y'rs  - tim

This is executable as-is:

from threading import Thread
import time

def square(n, answer):
    print "in square"
    for i in range(n):
        answer.append(i**2)
    time.sleep(3)
    print "returning from square"

def cube(n, answer):
    print "in cube"
    for i in range(n):
        answer.append(i**3)
    time.sleep(5)
    print "returning from cube"

for i in range(4):
    t1result = []
    t2result = []
    t1 = Thread(target=square, args=(2*i, t1result))
    t2 = Thread(target=cube, args=(3*i, t2result))
    print "starting threads with i =", i
    t1.start(); t2.start()
    t1.join();  t2.join()
    print "back from joins"
    print "square returned", t1result
    print "cube returned", t2result