[Numpy-discussion] A crazy masked-array thought

Nathaniel Smith njs at pobox.com
Fri Apr 27 09:55:16 EDT 2012


On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley
<rhattersley at gmail.com> wrote:
> I know used a somewhat jokey tone in my original posting, but fundamentally
> it was a serious question concerning a live topic. So I'm curious about the
> lack of response. Has this all been covered before?
>
> Sorry if I'm being too impatient!

That's fine, I know I did read it, but I wasn't sure what to make of
it to respond :-)

> On 25 April 2012 16:58, Richard Hattersley <rhattersley at gmail.com> wrote:
>>
>> The masked array discussions have brought up all sorts of interesting
>> topics - too many to usefully list here - but there's one aspect I haven't
>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
>> too awkward to be helpful. But ...
>>
>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?
>>
>> In the library I'm working on, the introduction of MAs (via numpy.ma)
>> required us to sweep through the library and make a fair few changes. That's
>> not the sort of thing one would normally expect from the introduction of a
>> subclass.
>>
>> Putting aside the ABI issue, would it help downstream API compatibility if
>> the POA was a subclass of the MA? Code that's expecting/casting-to a POA
>> might continue to work and, where appropriate, could be upgraded in their
>> own time to accept MAs.

This makes a certain amount of sense from a traditional OO modeling
perspective, where classes are supposed to refer to sets of objects
and subclasses are subsets and superclasses are supersets. This is the
property that's needed to guarantee that if A is a subclass of B, then
any code that expects a B can also handle an A, since all A's are B's,
which is what you need if you're doing type-checking or type-based
dispatch. And indeed, from this perspective, MAs are a superclass of
POAs, because for every POA there's a equivalent MA (the one with the
mask set to all-true), but not vice-versa.

But, that model of OO doesn't have much connection to Python. In
Python's semantics, classes are almost irrelevant; they're mostly just
some convenience tools for putting together the objects you want, and
what really matters is the behavior of each object (the famous "duck
typing"). You can call isinstance() if you want, but it's just an
ordinary function that looks at some attributes on an object; the only
magic involved is that some of those attributes have underscores in
their name. In Python, subclassing mostly does two things: (1) it's a
quick way to define set up a class that's similar to another class
(though this is a worse idea than it looks -- you're basically doing
'from other_class import *' with all the usual tight-coupling problems
that 'import *' brings). (2) When writing Python objects at the C
level, subclassing lets you achieve memory layout compatibility (which
is important because C does *not* do duck typing), and it lets you add
new fields to a C struct.

So at this level, MAs are a subclass of POAs, because MAs have an
extra field that POAs don't...

So I don't know what to think about subclasses/superclasses here,
because they're such confusing and contradictory concepts that it's
hard to tell what the actual resulting API semantics would be.

- N



More information about the NumPy-Discussion mailing list