[Numpy-discussion] resizeable arrays using shared memory?
Sebastian Berg
sebastian at sipsolutions.net
Sat Feb 6 21:01:41 EST 2016
On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
> Hi all,
>
> I have a program that uses resize-able arrays. I already over
> -provision the arrays and use slices, but every now and then the data
> outgrows that array and it needs to be resized.
>
> Now, I would like to have these arrays shared between processes
> spawned via multiprocessing (for fast interprocess communication
> purposes, not for parallelizing work on an array). I don't care
> about mapping to a file on disk, and I don't want disk I/O happening.
> I don't care (really) about data being copied in memory on resize.
> I *do* want the array to be resized "in place", so that the child
> processes can still access the arrays from the object they were
> initialized with.
>
>
> I can share arrays easily using arrays that are backed by memmap.
> Ie:
>
> ```
> #Source: http://github.com/rainwoodman/sharedmem
>
>
> class anonymousmemmap(numpy.memmap):
> def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
>
> descr = numpy.dtype(dtype)
> _dbytes = descr.itemsize
>
> shape = numpy.atleast_1d(shape)
> size = 1
> for k in shape:
> size *= k
>
> bytes = int(size*_dbytes)
>
> if bytes > 0:
> mm = mmap.mmap(-1,bytes)
> else:
> mm = numpy.empty(0, dtype=descr)
> self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
> buffer=mm, order=order)
> self._mmap = mm
> return self
>
> def __array_wrap__(self, outarr, context=None):
> return
> numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
> context)
> ```
>
> This cannot be resized because it does not own it's own data
> (ValueError: cannot resize this array: it does not own its data).
> (numpy.memmap has this same issue [0], even if I set refcheck to
> False and even though the docs say otherwise [1]).
>
> arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
> Bad file descriptor). If I create a file and use that fileno to
> create the memmap, then I can resize `arr._mmap` but the array itself
> is not resized.
>
> Is there a way to accomplish what I want? Or, do I just need to
> figure out a way to communicate new arrays to the child processes?
>
I guess the answer is no, but the first question should be whether you
can create a new array viewing the same data that is just larger? Since
you have the mmap, that would be creating a new view into it.
I.e. your "array" would be the memmap, and to use it, you always rewrap
it into a new numpy array.
Other then that, you would have to mess with the internal ndarray
structure, since these kind of operations appear rather unsafe.
- Sebastian
> Thanks,
> Elliot
>
> [0] https://github.com/numpy/numpy/issues/4198.
>
> [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
> resize.html
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160207/9b6cac8c/attachment.sig>
More information about the NumPy-Discussion
mailing list