On Nov 6, 2017 4:19 PM, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:

You just summarized excellently why I'm on a quest to change `asarray`
to `asanyarray` within numpy

+1 -- we should all be using asanyarray() most of the time. 

The problem is that if you use 'asanyarray', then you're claiming that your code works correctly for:
- regular ndarrays
- np.matrix
- np.ma masked arrays
- and every third party subclass, regardless of their semantics, regardless of whether you've heard of them or not

If subclasses followed the Liskov substitution principle, and had different internal implementations but the same public ("duck") API, then this would be fine. But in practice, numpy limitations mean that ndarrays subclasses have to have the same internal implementation, so the only reason to make an ndarray subclass is if you want to make something with a different public API. Basically the whole system is designed for subclasses to be incompatible.

The end result is that if you use asanyarray, your code is definitely wrong, because there's no way you're actually doing the right thing for arbitrary ndarray subclasses. But if you don't use asanyarray, then yeah, that's also wrong, because it won't work on mostly-compatible subclasses like astropy's. Given this, different projects reasonably make different choices -- it's not just legacy code that uses asarray. In the long run we obviously need to come up with new options that don't have these tradeoffs (that's why we want to let units to to dtypes, implement methods like __array_ufunc__ to enable duck arrays, etc.) let's try to be sympathetic to other projects that are doing their best :-).