Parallel processing on shared data structures
google at mrabarnett.plus.com
Thu Mar 19 19:14:08 CET 2009
psaffrey at googlemail.com wrote:
> I'm filing 160 million data points into a set of bins based on their
> position. At the moment, this takes just over an hour using interval
> trees. I would like to parallelise this to take advantage of my quad
> core machine. I have some experience of Parallel Python, but PP seems
> to only really work for problems where you can do one discrete bit of
> processing and recombine these results at the end.
> I guess I could thread my code and use mutexes to protect the shared
> lists that everybody is filing into. However, my understanding is that
> Python is still only using one process so this won't give me multi-
> Does anybody have any suggestions for this?
Could you split your data set and run multiple instances of the script
at the same time and then merge the corresponding lists?
More information about the Python-list