[Python-Dev] Addition of "pyprocessing" module to standard lib.

Wed May 14 17:57:51 CEST 2008

On Wed, May 14, 2008 at 11:46 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-05-14 14:15, Jesse Noller wrote:
>>
>> On Wed, May 14, 2008 at 5:45 AM, Christian Heimes <lists at cheimes.de>
>> wrote:
>>>
>>> Martin v. Löwis schrieb:
>>>
>>>> I'm worried whether it's stable, what user base it has, whether users
>>>
>>>  > (other than the authors) are lobbying for inclusion. Statistically,
>>>  > it seems to be not ready yet: it is not even a year old, and has not
>>>  > reached version 1.0 yet.
>>>
>>>  I'm on Martin's side here. Although I like to see some sort of multi
>>>  processing mechanism in Python 'cause I need it for lots of projects I'm
>>>  against the inclusion of pyprocessing in 2.6 and 3.0. The project isn't
>>>  old and mature enough and it has some competitors like pp (parallel
>>>  processing).
>>>
>>>  On the one hand the inclusion of a package gives it an unfair advantage
>>>  over similar packages. On the other hand it slows down future
>>>  development because a new feature release must be synced with Python
>>>  releases about every 1.5 years.
>>>
>>>  -0.5 from me
>>>
>>>  Christian
>>>
>>
>> I said this in reply to Martin - but the competitors (in my mind) are
>> not as compelling due to the "alternative" paradigm for application
>> construction they propose. The processing module is an "easy win" for
>> us if included.
>>
>> Personally - I don't see how inclusion in the stdlib would slow down
>> development - yes, you have to stick with the same release cycle as
>> python-core, but if the module is "feature complete" and provides a
>> stable API as it stands I don't see following python-core timelines as
>> overly onerous.
>>
>> The module itself doesn't change that frequently - the last release in
>> April was a bugfix release and API consistency change (the API would
>> have to be locked for inclusion obviously - targeting a 2.7/3.1
>> release may be advantageous to achieve this).
>
> Why don't you start a parallel-sig and then hash this out with other
> distributed computing users ?
>
> You could then reach a decision by the time 2.7 is scheduled for release
> and then add the chosen module to the stdlib.
>
> The API of the processing module does look simple and nice, but
> parallel processing is a minefield - esp. when it comes to handling
> error situations (e.g. a worker failing, network going down, fail-over,
> etc.).
>
> What I'm missing with the processing module is a way to spawn processes
> on clusters (rather than just on a single machine).
>
> In the scientific world, MPI is the standard API of choice for doing
> parallel processing, so if we're after standards, supporting MPI
> would seem to be more attractive than the processing module.
>
>    http://pypi.python.org/pypi/mpi4py
>
> In the enterprise world, you often find CORBA based solutions.
>
>    http://omniorb.sourceforge.net/
>
> And then, of course, you have a gazillion specialized solutions
> such as PyRO:
>
>    http://pyro.sourceforge.net/
>
> OTOH, perhaps the stdlib should just include entry-level support
> for some form of parallel processing, in which case processing
> does look attractive.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>

Thanks for bringing up something I was going to mention - I am not
attempting to "solve" the distributed computing problem with this
proposal - you are right in mentioning there's a variety of
technologies out there for achieving "true" loosely-coupled
distributed computing, including all of those which you pointed out.

I am proposing exactly what you mentioned: Entry level parallel
processing. The fact that the processing module does have remote
capabilities is a bonus: Not core to the proposal. While in a
"perfect" world - a system might exist which truly insulates
programmers from the difference between local concurrency and
distributed systems - the two are really different problems. My
concern is the taking advantage of the 8 core machine sitting under my
desk (or the 10 or so I have in the lab)- the processing module allows
me to do that - easily.

The module is basic enough to be flexible in other technologies to use
with it for highly distributed systems but it is also simple enough to
act as an entry point for those people just starting out in the
domain. Think of it like the difference between asyncore and Twisted.
I could easily see more loosely-coupled-highly-distributed tools being
built on top of the basics it has provided.

-jesse