[Baypiggies] Guido's Blog: It isn't Easy to Remove the

Warren DeLano warren at delsci.com
Sat Sep 15 23:01:57 CEST 2007


Fazal,

> That said, if your interpreters don't share objects, you 
> might as well put them in separate processes.

Well, that is certainly an opinion held by some, but as an application
developer, I strongly disagree and expect that others would as well. 

There are many small-to-medium sized problems optimally addressed via
multiple lightweight threads within a single process.  It is not so much
an issue of needing to share objects simultaneously, but rather one of
being able to easily transfer objects between interpreters without
incurring serialization overhead.  

In the scientific computing & visualization space, an increasingly
dominant design pattern is data-flow, where streams of data objects are
passed through networks of nodes where each node processes and/or
transforms objects asynchrously and in parallel.  These networks can be
highly branched and complex, but fine-grained locking and complex
synchronization are completely unnecessary since only one node can own &
message any given object at a time.  

At present, CPython cannot support this particular design pattern, but
it likely could through addition of concurrent sub-interpreters along
with a robust mechanism for transferring objects between them.  

Consider: any object with a reference count of one could simply be
"sent" (disappear) from one interpreter and then appear in another
without copying or serialization.  Likewise, objects with reference
counts greater than one would instead be (deep) copied when sent.  That
is a simple and elegant solution that would "just work" as expected --
no batteries required.  

Sure, one could implement something like this through use of shared
memory between processes with one dedicated process per asynchronous
node, but that would be a complex workaround with significant overhead.
Indeed, we already have numerous solutions for process-level parallelism
-- that is really not what this is about.

Concisely stated, the major unmet need as respects parallel CPython is
an inability to easily create single-process shared-memory parallel
programs that optimally exploit multi-core CPUs -- and I predict that
the urgency of this need will double every eighteen months for the
forseeable future.  Realistically, it is not something that can be
safely ignored...

Thus, this need is something that really should be addressed by CPython
3.0, if  possible.  If some limited amount of code needs to be broken to
enable intraprocess parallelism, then the boundary of the major release
is the time to do it.  Without such capabilities, I believe that the
CPython VM will eventually become uncompetitive and its usage will wane.


We do not want a JVM or CLI monkey on our back just to be able to write
parallel Python programs!  But like Guido says:  Quit your complaining
-- and go find a solution!  ...so what are [we] going to do about it?

Though I personally lack the knowledge, skill, and free time to heavily
modify Python source, my company would gladly commit financial resources
(tens of kilobucks) to bring about concurrent sub-interpreters in
CPython 3.0, if there were a credible means of doing so in a way that
would satisfy our esteemed BDFL.

The requirement for a super-interpreter doesn't concern me at all.
Ultimately, some entity has to be responsible for set-up, management,
and take-down of concurrent interpreters, so why not use what already
exists in CPython: the global singleton master interpreter?  Guido has
made it perflectly clear that it isn't going away, so let's work with
what we've got and find a pragmatic way of meeting this need.

(Any bounty hunters lurking?)

Cheers,
Warren

> -----Original Message-----
> From: Fazal Majid [mailto:fmajid at kefta.com] 
> Sent: Friday, September 14, 2007 6:42 PM
> To: Warren DeLano
> Subject: Re: [Baypiggies] Guido's Blog: It isn't Easy to Remove the
> 
> On Sep 14, 2007, at 17:04 , Warren DeLano wrote:
> 
> > Call me crazy, but it seems to me that the problem isn't so 
> much the 
> > GIL, but rather, it is our inability to simultaneously run multiple 
> > Python interpreters within a single process, with each interpreter 
> > playing nicely with native C extensions and libraries.
> 
> mod_python does embed multiple interpreters in a single process.
> That said, if your interpreters don't share objects, you 
> might as well put them in separate processes. If they do 
> share state, coordinating garbage collection, atomic 
> dictionary updates and so on would require creating such 
> overhead that you would have a de-facto super-interpreter 
> that would have its own GIL.
> 
> What we need to do is look at the design patters for 
> concurrent applications and ensure they are well supported by 
> Python. Off the top of my head:
> 
> Divide and conquer (worker pools)
> ---------------------------------
> Spread computationally intensive work among multiple parallel 
> processes, possibly across different machines in a cluster
> 
> Needs good IPC (Queue-like) to distribute work, and 
> mechanisms to start worker processes on demand and manage them.
> 
> Resource pools
> --------------
> Manage a pool of precious resources (e.g. database connections).
> 
> If resource utilization is high, the pool can easily be split 
> across multiple processes. If resource utilization is low 
> because the GIL is the bottleneck, the application can be 
> restructured to move the computation to worker pools.
> 
> Caches
> ------
> The threads share a common cache. Splitting the cache into 
> multiple sub-caches is not desirable because it would reduce 
> hit rates (a single big cache has better hit rates than 
> multiple small caches).
> 
> This can be worked around by doing hash partitioning of the cache.  
> Other mechanisms involve shared memory or database-like approaches.
> 
> ----------------------------------------------------------------
> Fazal Majid  | Acxiom Digital - Kefta
> 415-391-6881 x8014 office | 415-391-7079 fax One Kearny 
> Street, 9th Floor | San Francisco, CA 94109 | USA | www.acxiom.com
> ACXIOM(r)   WE MAKE INFORMATION INTELLIGENT(TM)



More information about the Baypiggies mailing list