Parallelization in Python 2.6

Robert Dailey rcdailey at gmail.com
Tue Aug 18 22:45:38 CEST 2009


On Aug 18, 3:41 pm, Jonathan Gardner <jgard... at jonathangardner.net>
wrote:
> On Aug 18, 11:19 am, Robert Dailey <rcdai... at gmail.com> wrote:
>
>
>
>
>
> > I'm looking for a way to parallelize my python script without using
> > typical threading primitives. For example, C++ has pthreads and TBB to
> > break things into "tasks". I would like to see something like this for
> > python. So, if I have a very linear script:
>
> > doStuff1()
> > doStuff2()
>
> > I can parallelize it easily like so:
>
> > create_task( doStuff1 )
> > create_task( doStuff2 )
>
> > Both of these functions would be called from new threads, and once
> > execution ends the threads would die. I realize this is a simple
> > example and I could create my own classes for this functionality, but
> > I do not want to bother if a solution already exists.
>
> If you haven't heard of the Python GIL, you'll want to find out sooner
> rather than later. Short summary: Python doesn't do threading very
> well.
>
> There are quite a few parallelization solutions out there for Python,
> however. (I don't know what they are off the top of my head, however.)
> The way they work is they have worker processes that can be spread
> across machines. When you want to parallelize a task, you send off a
> function to those worker threads.
>
> There are some serious caveats and problems, not the least of which is
> sharing code between the worker threads and the director, so this
> isn't a great solution.
>
> If you're looking for highly parallelized code, Python may not be the
> right answer. Try something like Erlang or Haskell.

Really, all I'm trying to do is the most trivial type of
parallelization. Take two functions, execute them in parallel. This
type of parallelization is called "embarrassingly parallel", and is
the simplest form. There are no dependencies between the two
functions. They do requires read-only access to shared data, though.
And if they are being spawned as sub-processes this could cause
problems, unless the multiprocess module creates pipelines or other
means to handle this situation.



More information about the Python-list mailing list