2013/6/20 Mike Beller <mike@tradeworx.com>

Thank you both, for your encouragement.

So I have made progress. I now have some unit tests, and they can compile and run. I have imported the memmap.py from numpy, and modified it so it gets the needed items from micronumpy. I can run the unit tests and they fail for the correct reason -- that reason being the buffer attribute is not supported in interp_numarray.descr_new_array() . So that's great.

Great indeed!

Here are my next questions:

1) The way the numpy mmap module works, it calls ndarray.__new__(), and monkey-patches the return value to add _mmap, offset, and mode attributes. This, for example, ensures the mmap object is kept around until the array is deleted. However, I can't monkey-patch a numpy ndarray object. I presume this is because it is an interpreter level object rather than an app level one? Anyway -- not sure how to deal with this situation.

No, that's because ndarray.__new__(subtype, ...) returns a ndarray. This is wrong, it should return a instance of the subtype (the memmap.memmap class in this case).

ndarray has no __dict__, OTOH memmap is a Python-defined class and has a __dict__ where you can store attributes.

(It's also possible to have a __dict__ on ndarray, but it's not necessary here)

In interp_numarray.py, descr_new_array() does not use w_subtype at all! This means that ndarray cannot be subclassed in Python...

To make the necessary changes, you can pick one module and see how it's done there.

For example, in the bz2 module, __new__ could simply have written "return W_BZ2File(space)", but instead it handles subclasses correctly:

def descr_bz2file__new__(space, w_subtype, __args__):

bz2file = space.allocate_instance(W_BZ2File, w_subtype)

W_BZ2File.__init__(bz2file, space)

return space.wrap(bz2file)

ndarray should do the same, maybe by changing W_NDimArray.from_shape() (and friends) this way:

@classmethod

def from_shape(w_subtype, shape, dtype, order='C'):

...

if w_subtype is None:

return W_NDimArray(impl)

else:

...allocate_instance, __init__ and wrap...

This subclassing feature should have its own unit tests, btw.

2) Secondly, the mmap object itself doesn't really provide a usable buffer implementation. The implementation of buffer(mmap) is currently W_MMap.descr_buffer(), (found in interp_mmap.py), which returns a StringLikeBuffer object. This object (implemented in pypy/interpreter/buffer.py) is a subclass of Buffer, which does not implement get_raw_address(). Our current plan clearly requires the buffer object to implement get_raw_address so it can be used by ndarray.from_shape_and_storage(). Interestingly, it seems as if the interp_mmap author anticipated this shortcoming -- there is a comment: "improve to work directly on low-level address" right in the descr_buffer method.

So -- am I on the wrong path? Should I not even bother trying to use the mmap? (since I can't monkey patch it and it doesn't do what I want?) This would mean perhaps using the underlying rffi mmap to build my own memmap module. Alternatively, can I fix the monkey-patching problem some other way, and then take the advice of interp_mmap's author to "improve to work directly on low-level address" by returning something better than a StringLikeBuffer object.

This was the third task I mentioned earlier. It turns out that Armin implemented it just this morning, thanks! :-)

Mike, you are doing well. Please keep going.

--
Amaury Forgeot d'Arc