[stdlib-sig] Processing module inclusion into the stdlib proposal

Tue Mar 18 16:44:10 CET 2008

I have started work on a PEP for the inclusion of the PyProcessing
module (http://pypi.python.org/pypi/processing/ and
http://developer.berlios.de/projects/pyprocessing) for inclusion into
the stdlib in an upcoming release.

The pyprocessing module "mostly mimicks" the threading module API to
provide a "drop in" process-based approach to concurrency allowing
applications within python to utilize multiple cores. For example:

from threading import Thread
class threads_object(Thread):
    def run(self):
        function_to_run()

becomes:

from processing import Process
class process_object(Process):
    def run(self):
        function_to_run()

Currently, the module runs on Unix/Linux/OSX and Windows. It supports
the following features:

* Objects can be transferred between processes using pipes or
  multi-producer/multi-consumer queues.

* Objects can be shared between processes using a server process or
  (for simple data) shared memory.

* Equivalents of all the synchronization primitives in ``threading``
  are available.

* A ``Pool`` class makes it easy to submit tasks to a pool of worker
  processes.

In addition to local concurrency ala "just like threading" model - the
processing module allows users to also share data and processes across
a cluster of machines via the server Managers and Proxy objects
(secured data transfer is supported, data is transferred via pickles).
I believe that prior/during inclusion, additional tests will be
desired to enhance coverage - but the tests which are already provided
test and showcase the module well.

I have spoken to the writer, Richard Oudkerk about his willingness to
maintain the processing module according to normal stdlib requirements
and he is more than willing to do so.

I believe inclusion into the standard library will be very beneficial
for a lot of people looking to larger-scale applications and for a
method to side-step the current threading/GIL implementation. This
module easily allows users to exploit their "$N core" machines, in a
fashion they are already familiar with.

I would suggest that the module is placed at the top-level next to the
threading module, however, there is the thought that both this module,
and the threading module should be moved to a concurrent.* namespace
(i.e: concurrent.threading, concurrent.processing) to allow for
additional library inclusion at a later date.

IMHO: I think this is simply a "first step" in the evolution of the
python stdlib to support these sorts of things - but I also believe it
is an excellent first step.

Please feel free to ask any questions, comments, etc. The more
feedback the better the PEP will be!

-jesse