[Tutor] Threading in Python

Eric Brunson brunson at brunson.com
Thu Sep 20 06:48:40 CEST 2007


James wrote:
> Thanks for the quick reply.
>
> Interesting.  I'm a little overwhelmed with the different terminology  
> (fork, spawn, thread, etc.).  I'm under the impression that I'm  
> supposed to use os.fork() or os.spawn() for something like what I'm  
> trying to do (start multiple instances of the I/O utility from one  
> Python script).
>   

A fork is a fundamental system call in which the OS makes a nearly 
identical copy of the running process.  I know it's a kind of *-hole 
thing to say, but... if you don't know why you'd want to fork your 
process, you probably don't need to.  Forking is usually used for 
disassociating yourself from your parent process to become a daemon.  
However, it's a basic function of the system an intrinsic in many other 
higher level actions.

One you don't mention is "exec", which is to replace your running 
process image with a new process image.  You can do it from the shell, 
type "exec somebinary" and that binary replaces your shell process, so 
when the exec'd process exits, your session is terminated.

I mention that because when you combine a fork with an exec, you get a 
spawn.  Your parent process duplicates itself, but the child process 
chooses to exec another process.  So the child copy of the initial 
process is replaced by new running binary and you have a spawned process 
running as a child of the first.

Finally, a thread (sometimes referred to as a "lightweight process" or 
"lwp") is kind of like a fork, except a fork duplicates everything about 
the initial process (except a return code) while a thread shares state 
with the parent process and all its sibling threads.

The interesting thing about a python thread is that it is not an OS 
level thread, it is a separate execution thread, but still controlled by 
the python interpreter.  So, while a dual processor computer can choose 
to execute two different processes or thread simultaneously, since 
there's only one python interpreter (per python process) a python thread 
is never run concurrently with another thread in the same python 
process.  It's more of a conceptual thing,

> However, from what I gather from the documentation, os.fork() is  
> going to fork the python Python script that calls the original fork  
> (so fork won't directly start off the programs that I need).  How  
> would I go about forking + then executing an application?  Isn't this  
> what spawn does?  Or should I stick with fork + exec*?
>   

However, what you are trying to do, i.e. spawn multiple concurrent child 
processes, could actually take advantage of a multi processor system 
using python threads.  If you created multiple threads, many of which 
spawned an independent subprocess, those subprocesses could be executed 
concurrently by different processors, while their status was still being 
coordinated via the python thread model.

Just give it a go knowing that it is an efficient design, and drop us a 
line if you have more questions or any problems.

Sincerely,
e.

> Lots to learn, I guess.  ;)
>   

Always.  ;-)  When you think there's nothing else to learn, you've 
already become obsolete.

> .james
>
> On Sep 19, 2007, at 10:19 PM, Kent Johnson wrote:
>
>   
>> James wrote:
>>     
>>> Hi.  :)
>>> I have a question regarding threading in Python.  I'm trying to  
>>> write  a wrapper script in Python that will spin off multiple  
>>> (lots!) of  instances of an I/O benchmark/testing utility.  I'm  
>>> very interested  in doing this in Python, but am unsure if this is  
>>> a good idea.  I  thought I read somewhere online that because of  
>>> the way Python was  written, even if I spun off (forked off?)  
>>> multiple instances of a  program, all those child processes would  
>>> be restricted to one CPU.   Is this true?
>>>       
>> Python *threads* are limited to a single CPU, or at least they will  
>> not run faster on multiple CPUs. I don't think there is any such  
>> restriction for forked processes.
>>
>>     
>>> I'm not sure if the utility I'm forking is CPU-intensive; it may  
>>> very  well be.  Does Python indeed have this limitation?
>>>       
>> I would think an I/O benchmark is more likely to be I/O bound...
>>
>>     
>>> Also, should I be using os.fork() for this kind of program?
>>>       
>> There is a fair amount of activity these days around making Python  
>> friendly to multi-processing. See
>> http://wiki.python.org/moin/ParallelProcessing
>>
>> Kent
>>     
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>   



More information about the Tutor mailing list