On Tue, Jul 2, 2019 at 1:45 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
On Tue, 2 Jul 2019, at 4:34 PM, Stephan Hoyer wrote:
This is addressed in the NEP, see bullet 1 under "Partial implementation of NumPy's API":

My concern is that fallback coercion behavior makes it difficult to reliably implement "strict" overrides of NumPy's API. Fallback coercion is certainly useful for interactive use, but it isn't really appropriate for libraries.

Do you mean "fallback coercion in NumPy itself", or "at all"? Right now there's lots of valid code around, e.g. `np.median(some_dask_array)`. Users will keep wanting to do that. Forcing everyone to write  `np.median(np.array(some_dask_array))` serves no purpose. So the coercion has to be somewhere. You're arguing that that's up to Dask et al I think?

Putting it in Dask right now still doesn't address Juan's backwards compat concern, but perhaps that could be bridged with a Dask bugfix release and some short-lived pain. 

I'm not convinced that this shouldn't be fixed in NumPy though. Your concern "reliably implement "strict" overrides of NumPy's API" is a bit abstract. Overriding the _whole_ NumPy API is definitely undesirable. If we'd have a reference list somewhere about every function that is handled with __array_function__, then would that address your concern? Such a list could be auto-generated fairly easily.

In contrast to putting this into NumPy, if a library like dask prefers to issue warnings or even keep around fallback coercion indefinitely (not that I would recommend it), they can do that by putting it in their __array_function__ implementation.

I get the above concerns, and thanks for bringing them up, Stephan, as I'd only skimmed the NEP the first time around and missed them. Nevertheless, the fact is that the current behaviour breaks user code that was perfectly valid until NumPy 1.16, which seems, well, insane. So, warning for a few versions followed raising seems like the only way forward to me. The NEP explicitly states “We would like to gain experience with how __array_function__ is actually used before making decisions that would be difficult to roll back.” I think that this breakage *is* that experience, and the decision right now should be not to break user code with no warning period.

I'm also wondering where the list of functions that must be implemented can be found, so that libraries like dask and CuPy can be sure that they have a complete implementation, and further typeerrors won't be raised with their arrays.

This is one of the reasons I'm working on https://github.com/Quansight-Labs/rnumpy. It doesn't make sense for any library to copy the whole NumPy API, it's way too large with lots of stuff in there that's only there for backwards compat and has a better alternative or shouldn't be in NumPy in the first place.

Cheers,
Ralf