[Numpy-discussion] A crazy masked-array thought

Neal Becker ndbecker2 at gmail.com
Sat Apr 28 12:58:36 EDT 2012


Nathaniel Smith wrote:

> On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
> <rhattersley at gmail.com> wrote:
>> So, assuming numpy.ndarray became a strict subclass of some new masked
>> array, it looks plausible that adding just a few checks to numpy.ndarray to
>> exclude the masked superclass would prevent much downstream code from
>> accidentally operating on masked arrays.
> 
> I think the main point I was trying to make is that it's the existence
> and content of these checks that matters. They don't necessarily have
> any relation at all to which thing Python calls a "superclass" or a
> "subclass".
> 
> -- Nathaniel

I don't agree with the argument that ma should be a superclass of ndarray.  It 
is ma that is adding features.  That makes it a subclass.  We're not talking 
mathematics here.

There is a well-known disease of OOP where everything seems to bubble up to the 
top of the class hierarchy - so that the base class becomes bloated to support 
every feature needed by subclasses.  I believe that's considered poor design.

Is there a way to support ma as a subclass of ndarray, without introducing 
overhead into ndarray?  Without having given this much real thought, I do have 
some idea.  What are the operations that we need on arrays?  The most basic are:

1. element access
2. get size (shape)

In an OO design, these would be virtual functions (or in C, pointers to 
functions).  But this would introduce unacceptable overhead.

In a generic programming design (c++ templates), we would essentially generate 2 
copies of every function, one that operates on plain arrays, and one that 
operates on masked arrays, each using the appropriate function for element 
access, shape, etc.  This way, no uneeded overhead is introduced, (although the 
code size is increased - but this is probably of little consequence on modern 
demand-paged OS).

Following this approach, ma and ndarray don't have to have any inheritance 
relation.  OTOH, inheritance is probably useful since there are many common 
features to ma and ndarray, and a lot of code could be shared.






More information about the NumPy-Discussion mailing list