Parallel processing on shared data structures

Thu Mar 19 14:14:08 EDT 2009

psaffrey at googlemail.com wrote:
> I'm filing 160 million data points into a set of bins based on their
> position. At the moment, this takes just over an hour using interval
> trees. I would like to parallelise this to take advantage of my quad
> core machine. I have some experience of Parallel Python, but PP seems
> to only really work for problems where you can do one discrete bit of
> processing and recombine these results at the end.
> 
> I guess I could thread my code and use mutexes to protect the shared
> lists that everybody is filing into. However, my understanding is that
> Python is still only using one process so this won't give me multi-
> core.
> 
> Does anybody have any suggestions for this?
> 
Could you split your data set and run multiple instances of the script
at the same time and then merge the corresponding lists?