Hi, I was trying to parallelize some algorithms and needed a writable array shared between processes. It turned out to be quite simple and gave a nice speed up almost linear in number of cores. Of course you need to know what you are doing to avoid segfaults and such. But I still think something like this should be included with NumPy for power users. This works by inheriting anonymous mmaped memory. Not sure if this works on windows. import numpy as np import multiprocessing as mp class shared(np.ndarray): """Shared writable array""" def __new__(subtype, shape, interface=None): size = np.prod(shape) if interface == None: buffer = mp.RawArray('d', size) self = np.ndarray.__new__(subtype, shape, float, buffer) else: class Dummy(object): pass buffer = Dummy() buffer.__array_interface__ = interface a = np.asarray(buffer) self = np.ndarray.__new__(subtype, shape=a.shape, buffer=a) return self def __reduce_ex__(self, protocol): return shared, (self.shape, self.__array_interface__) def __reduce__(self): return __reduce_ex__(self, 0) Also see attached file for example usage. Erik
Erik Rigtorp, on 2010-12-30 21:30, wrote:
Hi,
I was trying to parallelize some algorithms and needed a writable array shared between processes. It turned out to be quite simple and gave a nice speed up almost linear in number of cores. Of course you need to know what you are doing to avoid segfaults and such. But I still think something like this should be included with NumPy for power users.
This works by inheriting anonymous mmaped memory. Not sure if this works on windows. --snip--
I've successfully used (what I think is) Sturla Molden's shmem_as_ndarray as outline here [1] and here [2] for these purposes. 1. http://groups.google.com/group/comp.lang.python/browse_thread/thread/79fcf02... 2. http://folk.uio.no/sturlamo/python/multiprocessing-tutorial.pdf -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7
On Fri, Dec 31, 2010 at 02:13, Paul Ivanov
Erik Rigtorp, on 2010-12-30 21:30, wrote:
Hi,
I was trying to parallelize some algorithms and needed a writable array shared between processes. It turned out to be quite simple and gave a nice speed up almost linear in number of cores. Of course you need to know what you are doing to avoid segfaults and such. But I still think something like this should be included with NumPy for power users.
This works by inheriting anonymous mmaped memory. Not sure if this works on windows. --snip--
I've successfully used (what I think is) Sturla Molden's shmem_as_ndarray as outline here [1] and here [2] for these purposes.
Yeah, i saw that code too. My implementation is even more lax, but easier to use. It sends arrays by memory reference to subprocesses. Dangerous: yes, effective: very. It would be nice if we could stamp out some good effective patterns using multiprocessing and include them with numpy. The best solution is probably a parallel_for function: def parallel_for(func, inherit_args, iterable): ... Where func should be def func(inherit_args, item): ... And parallel_for makes sure inherit_args are viewable as a class shared() with writable shared memory. Erik
participants (2)
-
Erik Rigtorp
-
Paul Ivanov