[Numpy-discussion] A crazy masked-array thought
Neal Becker
ndbecker2 at gmail.com
Sat Apr 28 12:58:36 EDT 2012
Nathaniel Smith wrote:
> On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
> <rhattersley at gmail.com> wrote:
>> So, assuming numpy.ndarray became a strict subclass of some new masked
>> array, it looks plausible that adding just a few checks to numpy.ndarray to
>> exclude the masked superclass would prevent much downstream code from
>> accidentally operating on masked arrays.
>
> I think the main point I was trying to make is that it's the existence
> and content of these checks that matters. They don't necessarily have
> any relation at all to which thing Python calls a "superclass" or a
> "subclass".
>
> -- Nathaniel
I don't agree with the argument that ma should be a superclass of ndarray. It
is ma that is adding features. That makes it a subclass. We're not talking
mathematics here.
There is a well-known disease of OOP where everything seems to bubble up to the
top of the class hierarchy - so that the base class becomes bloated to support
every feature needed by subclasses. I believe that's considered poor design.
Is there a way to support ma as a subclass of ndarray, without introducing
overhead into ndarray? Without having given this much real thought, I do have
some idea. What are the operations that we need on arrays? The most basic are:
1. element access
2. get size (shape)
In an OO design, these would be virtual functions (or in C, pointers to
functions). But this would introduce unacceptable overhead.
In a generic programming design (c++ templates), we would essentially generate 2
copies of every function, one that operates on plain arrays, and one that
operates on masked arrays, each using the appropriate function for element
access, shape, etc. This way, no uneeded overhead is introduced, (although the
code size is increased - but this is probably of little consequence on modern
demand-paged OS).
Following this approach, ma and ndarray don't have to have any inheritance
relation. OTOH, inheritance is probably useful since there are many common
features to ma and ndarray, and a lot of code could be shared.
More information about the NumPy-Discussion
mailing list