[Numpy-discussion] Behavior of .base

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Tue Oct 2 03:38:11 EDT 2012

On 10/01/2012 04:56 PM, Charles R Harris wrote:
> On Mon, Oct 1, 2012 at 8:40 AM, Thouis (Ray) Jones <thouis at gmail.com
> <mailto:thouis at gmail.com>> wrote:
>     On Mon, Oct 1, 2012 at 8:20 AM, Nathaniel Smith <njs at pobox.com
>     <mailto:njs at pobox.com>> wrote:
>      > [...]
>      > How can we discourage people from doing this in the future? Can we
>      > make .base write-only from the Python level (with suitable
>     deprecation
>      > period)? Rename it to ._base (likewise) so that it's still
>     possible to
>      > peek under the covers but we remind people that it's really an
>      > implementation detail with poorly defined semantics that might
>     change?
>     Could we use the simpler .base behavior (fully collapsing the .base
>     chain), but be more aggressive about propagating information like
>     address/filename/offset for np.arrays that are created by slicing,
>     asarray(), etc.?
>     Ray
>     (Sorry if I'm missing some context that makes this suggestion idiotic.
>       I'm still trying to catch back up on the list and may have missed
>     relevant discussion on other threads.)
> It might be productive to step back a bit and ask if this is a memmap
> problem or a workflow problem. My impression is that pickling memmaps is
> a solution to a higher level problem in Scikits.learn workflow and I'd
> like more details on what that problem is.

I'm not scikits-learn, but I'm pretty sure this is about wanting to use 
multiprocessing to parallelise code. You send pickled views of arrays, 
but the memory is shared amongst all processes (using either a file, or 
process shared memory).

It would be cool to have some support for this in NumPy itself. The 
scikits-learn people should chime in here, but a suggestion:

# pickles by reference to process-shared memory, or raises an exception
# if memory can't be process-shared
s = dumps(arr.byref)
# in another process:
arr = loads(s)

Of course, *real* fixes would be to remove the GIL, or push forward the 
work in CPython on multiple independent interpreters in the same 
process. But that's rather more difficult.

Dag Sverre

More information about the NumPy-Discussion mailing list