[Numpy-discussion] Should ndarray be a context manager?

Sturla Molden sturla.molden at gmail.com
Tue Dec 9 10:01:58 EST 2014


I wonder if ndarray should be a context manager so we can write 
something like this:


    with np.zeros(n) as x:
       [...]


The difference should be that __exit__ should free the memory in x (if 
owned by x) and make x a zero size array.

Unlike the current ndarray, which does not have an __exit__ method, this 
would give precise control over when the memory is freed. The timing of 
the memory release would not be dependent on the Python implementation, 
and a reference cycle or reference leak would not accidentally produce a 
memory leak. It would allow us to deterministically decide when the 
memory should be freed, which e.g. is useful when we work with large arrays.


A problem with this is that the memory in the ndarray would be volatile 
with respect to other Python threads and view arrays. However, there are 
dozens of other ways to produce segfaults or buffer overflows with NumPy 
(cf. stride_tricks or wrapping external buffers).


Below is a Cython class that does something similar, but we would need 
to e.g. write something like

     with Heapmem(n * np.double().itemsize) as hm:
         x = hm.doublearray
         [...]

instead of just

     with np.zeros(n) as x:
         [...]


Sturla


# (C) 2014 Sturla Molden

from cpython cimport PyMem_Malloc, PyMem_Free
from libc.string cimport memset
cimport numpy as cnp
cnp.init_array()


cdef class Heapmem:

     cdef:
         void *_pointer
         cnp.intp_t _size

     def __cinit__(Heapmem self, Py_ssize_t n):
         self._pointer = NULL
         self._size = <cnp.intp_t> n

     def __init__(Heapmem self, Py_ssize_t n):
         self.allocate()

     def allocate(Heapmem self):
         if self._pointer != NULL:
             raise RuntimeError("Memory already allocated")
         else:
             self._pointer = PyMem_Malloc(self._size)
             if (self._pointer == NULL):
                 raise MemoryError()
             memset(self._pointer, 0, self._size)

     def __dealloc__(Heapmem self):
         if self._pointer != NULL:
             PyMem_Free(self._pointer)
             self._pointer = NULL

     property pointer:
         def __get__(Heapmem self):
             return <cnp.intp_t> self._pointer

     property doublearray:
         def __get__(Heapmem self):
             cdef cnp.intp_t n = self._size//sizeof(double)
             if self._pointer != NULL:
                 return cnp.PyArray_SimpleNewFromData(1, &n,
                                  cnp.NPY_DOUBLE, self._pointer)
             else:
                 raise RuntimeError("Memory not allocated")

     property chararray:
         def __get__(Heapmem self):
             if self._pointer != NULL:
                 return cnp.PyArray_SimpleNewFromData(1, &self._size,
                                  cnp.NPY_CHAR, self._pointer)
             else:
                 raise RuntimeError("Memory not allocated")

     def __enter__(self):
         if self._pointer != NULL:
             raise RuntimeError("Memory not allocated")

     def __exit__(Heapmem self, type, value, traceback):
         if self._pointer != NULL:
             PyMem_Free(self._pointer)
             self._pointer = NULL








More information about the NumPy-Discussion mailing list