[Numpy-discussion] resizeable arrays using shared memory?

Feng Yu rainwoodman at gmail.com
Tue Feb 9 04:51:36 EST 2016


Hi,

If the base address and size of the anonymous memory map are 'shared',
then one can protect them with a lock, grow the memmap with remap (or
unmap and map, or other tricks), and release the lock. During the
'resize' call, any reference to the array from Python in other
processes could just spin on the lock.

This is probably more defined than using signals. but I am not sure
about how to enforce the spinning when an object is referenced.

A possibility is that one can insist that a 'resizable' mmap must be
accessed via a context manager, e.g.

growable = shm.growable(initsize)

rank = do the magic to fork processes

if rank == 0:
    growable.grow(fill=0, size=10)
else:
    with growable as a:
       a += 10


Yu

On Sun, Feb 7, 2016 at 3:11 PM, Elliot Hallmark <Permafacture at gmail.com> wrote:
> That makes sense.  I could either send a signal to the child process letting
> it know to re-instantiate the numpy array using the same (but now resized)
> buffer, or I could have it check to see if the buffer has been resized when
> it might need it and re-instantiate then.  That's actually not too bad.  It
> would be nice if the array could be resized, but it's probably unstable to
> do so and there isn't much demand for it.
>
> Thanks,
>   Elliot
>
> On Sat, Feb 6, 2016 at 8:01 PM, Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>>
>> On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
>> > Hi all,
>> >
>> > I have a program that uses resize-able arrays.  I already over
>> > -provision the arrays and use slices, but every now and then the data
>> > outgrows that array and it needs to be resized.
>> >
>> > Now, I would like to have these arrays shared between processes
>> > spawned via multiprocessing (for fast interprocess communication
>> > purposes, not for parallelizing work on an array).  I don't care
>> > about mapping to a file on disk, and I don't want disk I/O happening.
>> >   I don't care (really) about data being copied in memory on resize.
>> > I *do* want the array to be resized "in place", so that the child
>> > processes can still access the arrays from the object they were
>> > initialized with.
>> >
>> >
>> > I can share arrays easily using arrays that are backed by memmap.
>> > Ie:
>> >
>> >     ```
>> >     #Source: http://github.com/rainwoodman/sharedmem
>> >
>> >
>> >     class anonymousmemmap(numpy.memmap):
>> >         def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
>> >
>> >             descr = numpy.dtype(dtype)
>> >             _dbytes = descr.itemsize
>> >
>> >             shape = numpy.atleast_1d(shape)
>> >             size = 1
>> >             for k in shape:
>> >                 size *= k
>> >
>> >             bytes = int(size*_dbytes)
>> >
>> >             if bytes > 0:
>> >                 mm = mmap.mmap(-1,bytes)
>> >             else:
>> >                 mm = numpy.empty(0, dtype=descr)
>> >             self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
>> > buffer=mm, order=order)
>> >             self._mmap = mm
>> >             return self
>> >
>> >         def __array_wrap__(self, outarr, context=None):
>> >             return
>> > numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
>> > context)
>> >     ```
>> >
>> > This cannot be resized because it does not own it's own data
>> > (ValueError: cannot resize this array: it does not own its data).
>> > (numpy.memmap has this same issue [0], even if I set refcheck to
>> > False and even though the docs say otherwise [1]).
>> >
>> > arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
>> > Bad file descriptor).  If I create a file and use that fileno to
>> > create the memmap, then I can resize `arr._mmap` but the array itself
>> > is not resized.
>> >
>> > Is there a way to accomplish what I want?  Or, do I just need to
>> > figure out a way to communicate new arrays to the child processes?
>> >
>>
>> I guess the answer is no, but the first question should be whether you
>> can create a new array viewing the same data that is just larger? Since
>> you have the mmap, that would be creating a new view into it.
>>
>> I.e. your "array" would be the memmap, and to use it, you always rewrap
>> it into a new numpy array.
>>
>> Other then that, you would have to mess with the internal ndarray
>> structure, since these kind of operations appear rather unsafe.
>>
>> - Sebastian
>>
>>
>> > Thanks,
>> >   Elliot
>> >
>> > [0] https://github.com/numpy/numpy/issues/4198.
>> >
>> > [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
>> > resize.html
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list