[Numpy-discussion] backwards compatibility and deprecation policy NEP

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Sun Jul 22 19:31:28 EDT 2018


Hi Ralf,


>> Overall, this looks good. But I think the subclassing section is somewhat
>> misleading in suggesting `ndarray` is not well designed to be subclassed.
>> At least, for neither my work on Quantity nor that on MaskedArray, I've
>> found that the design of `ndarray` itself was a problem. Instead, it was
>> the functions that were, as most were not written with subclassing or duck
>> typing in mind, but rather with the assumption that all input should be an
>> array, and that somehow it is useful to pass anything users pass in through
>> `asarray`. With then layers on top to avoid this in specific
>> circumstances... But perhaps this is what you meant? (I would agree,
>> though, that some ndarray subclasses have been designed poorly -
>> especially, matrix, which then led to a problematic duck array in sparse -
>> and that this has resulted in substantial hassle. Also, subclassing the
>> subclasses is much more problematic that subclassing ndarray - MaskedArray
>> being a particularly annoying example!)
>>
>
> You're completely right I think. We have had problems with subclasses for
> a long time, but that is due to mainly np.matrix being badly behaved, which
> then led to code everywhere using asarray, which then led to lots of issues
> with other subclasses. This basically meant subclasses were problematic,
> and hence most numpy devs would like to not see more subclasses.
>

Perhaps this history is in fact useful to mention? To learn from mistakes,
it must be possible to know about them!


>
>> The subclassing section also notes that subclassing has been discouraged
>> for a long time. Is that so? Over time, I've certainly had comments from
>> Nathaniel and some others in discussions of PRs  that go in that direction,
>> which perhaps reflected some internal consensus I wasn't aware of,
>>
>
> I think yes there is some vague but not written down mostly-consensus, due
> to the dynamic with asarray above.
>
>
>> but the documentation does not seem to discourage it (check, e.g., the
>> subclassing section [1]). I also think that it may be good to keep in mind
>> that until `__array_ufunc__`, there wasn't much of a choice - support for
>> duck arrays was even more half-hearted (hopefully to become much better
>> with `__array_function__`).
>>
>
> True. I think long term duck arrays are the way to go, because asarray is
> not going to disappear. But for now we just have to do the best we can
> dealing with subclasses.
>
> The subclassing doc [1] really needs an update on what the practical
> issues are.
>
> Indeed.


>
>> Overall, it seems to me that these days in the python eco-system
>> subclassing is simply expected to work. Even within numpy there are other
>> examples (e.g., ufuncs, dtypes) for which there has been quite a bit of
>> discussion about the benefits subclasses would bring.
>>
>
> I'm now thinking what to do with the subclassing section in the NEP. Best
> to completely remove? I was triggered to include it by some things Stephan
> said last week about subclasses being a blocker to adding new features. So
> if we keep the section, it may be helpful for you and Stephan to help shape
> that.
>
> I think even just the history you wrote above is useful.

Before suggesting further specific text, might it make sense for the NEP to
note that since subclassing will not go away, there is value in having at
least one non-trivial, well-designed subclass in numpy? I think eventually
MaskedArray might become that: it would be an internal check that
subclasses can work with all numpy functions (there is no reason for
duplication of functions in `np.ma`!). It also is an example of a
container-type subclass that adds extra information to an ndarray (since
that information is itself array-like, it is not necessarily a
super-logical subclass, but it is there... and can thus serve as an
example).

A second subclass which we have not discussed, but which I think is used
quite a bit (from my statistics of one...), is `np.memmap`. Useful if only
for showing that a relatively quick hack can give you something quite
helpful.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180722/330c5e82/attachment.html>


More information about the NumPy-Discussion mailing list