On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer <shoyer@gmail.com> wrote:

On Thu, Nov 2, 2017 at 9:45 AM <josef.pktd@gmail.com> wrote:
similar, scipy.special has ufuncs
what units are those?

Most code that I know (i.e. scipy.stats and statsmodels) does not use only
"normal mathematical operations with ufuncs"
I guess there are a lot of "abnormal" mathematical operations
where just simply propagating the units will not work.

Aside: The problem is more general also for other datastructures.
E.g. statsmodels for most parts uses only numpy ndarrays inside the
algorithm and computations because that provides well defined
behavior. (e.g. pandas behaved too differently in many cases)
I don't have much idea yet about how to change the infrastructure to
allow the use of dask arrays, sparse matrices and similar and possibly
automatic differentiation.

This is the exact same reason why pandas and xarray do not support wrapping arbitrary ndarray subclasses or duck array types. The operations we use internally (on numpy.ndarray objects) may not be what you would expect externally, and may even be implementation details not considered part of the public API. For example, in xarray we use numpy.nanmean() or bottleneck.nanmean() instead of numpy.mean().

For NumPy and xarray, I think we could (and should) define an interface to support subclasses and duck types for generic operations for core use-cases. My main concern with subclasses / duck-arrays is undefined/untested behavior, especially where we might silently give the wrong answer or trigger some undesired operation (e.g., loading a lazily computed into memory) rather than raising an informative error. Leaking implementation details is another concern: we have already had several cases in NumPy where a function only worked on a subclass if a particular method was called internally, and broke when that was changed.

Would this issue be ameliorated given Nathaniel's proposal to try to move away from subclasses and towards storing data in dtypes? Or would that just mean that xarray would need to ban dtypes it doesn't know about?

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion