[Tutor] Threading in Python
Eric Brunson
brunson at brunson.com
Thu Sep 20 06:48:40 CEST 2007
James wrote:
> Thanks for the quick reply.
>
> Interesting. I'm a little overwhelmed with the different terminology
> (fork, spawn, thread, etc.). I'm under the impression that I'm
> supposed to use os.fork() or os.spawn() for something like what I'm
> trying to do (start multiple instances of the I/O utility from one
> Python script).
>
A fork is a fundamental system call in which the OS makes a nearly
identical copy of the running process. I know it's a kind of *-hole
thing to say, but... if you don't know why you'd want to fork your
process, you probably don't need to. Forking is usually used for
disassociating yourself from your parent process to become a daemon.
However, it's a basic function of the system an intrinsic in many other
higher level actions.
One you don't mention is "exec", which is to replace your running
process image with a new process image. You can do it from the shell,
type "exec somebinary" and that binary replaces your shell process, so
when the exec'd process exits, your session is terminated.
I mention that because when you combine a fork with an exec, you get a
spawn. Your parent process duplicates itself, but the child process
chooses to exec another process. So the child copy of the initial
process is replaced by new running binary and you have a spawned process
running as a child of the first.
Finally, a thread (sometimes referred to as a "lightweight process" or
"lwp") is kind of like a fork, except a fork duplicates everything about
the initial process (except a return code) while a thread shares state
with the parent process and all its sibling threads.
The interesting thing about a python thread is that it is not an OS
level thread, it is a separate execution thread, but still controlled by
the python interpreter. So, while a dual processor computer can choose
to execute two different processes or thread simultaneously, since
there's only one python interpreter (per python process) a python thread
is never run concurrently with another thread in the same python
process. It's more of a conceptual thing,
> However, from what I gather from the documentation, os.fork() is
> going to fork the python Python script that calls the original fork
> (so fork won't directly start off the programs that I need). How
> would I go about forking + then executing an application? Isn't this
> what spawn does? Or should I stick with fork + exec*?
>
However, what you are trying to do, i.e. spawn multiple concurrent child
processes, could actually take advantage of a multi processor system
using python threads. If you created multiple threads, many of which
spawned an independent subprocess, those subprocesses could be executed
concurrently by different processors, while their status was still being
coordinated via the python thread model.
Just give it a go knowing that it is an efficient design, and drop us a
line if you have more questions or any problems.
Sincerely,
e.
> Lots to learn, I guess. ;)
>
Always. ;-) When you think there's nothing else to learn, you've
already become obsolete.
> .james
>
> On Sep 19, 2007, at 10:19 PM, Kent Johnson wrote:
>
>
>> James wrote:
>>
>>> Hi. :)
>>> I have a question regarding threading in Python. I'm trying to
>>> write a wrapper script in Python that will spin off multiple
>>> (lots!) of instances of an I/O benchmark/testing utility. I'm
>>> very interested in doing this in Python, but am unsure if this is
>>> a good idea. I thought I read somewhere online that because of
>>> the way Python was written, even if I spun off (forked off?)
>>> multiple instances of a program, all those child processes would
>>> be restricted to one CPU. Is this true?
>>>
>> Python *threads* are limited to a single CPU, or at least they will
>> not run faster on multiple CPUs. I don't think there is any such
>> restriction for forked processes.
>>
>>
>>> I'm not sure if the utility I'm forking is CPU-intensive; it may
>>> very well be. Does Python indeed have this limitation?
>>>
>> I would think an I/O benchmark is more likely to be I/O bound...
>>
>>
>>> Also, should I be using os.fork() for this kind of program?
>>>
>> There is a fair amount of activity these days around making Python
>> friendly to multi-processing. See
>> http://wiki.python.org/moin/ParallelProcessing
>>
>> Kent
>>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list