On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root <ben.v.root@gmail.com> wrote:
Duck typing is great and all for classes that implement some or all of the ndarray interface.... but remember what the main reason for asarray() and asanyarray(): to automatically promote lists and tuples and other "array-likes" to ndarrays. Ignoring the use-case of lists of lists is problematic at best.

Ben Root


On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
My 2¢ here is that all code should feel free to assume certain type of
input, as long as it is documented properly, but there is no reason to
enforce that by, e.g., putting `asarray` everywhere. Then, for some
pieces ducktypes and subclasses will just work like magic, and uses
you might never have foreseen become possible. For others, whoever
wants to use them has to do work (and up to a package maintainers to
decide whether or not to accept PRs that implement hooks, etc.)

I do see the argument that this way one becomes constrained in the
internal implementation, as a change may break an outward-looking
function, but while at times this may be inconvenient, in my
experience at others it may just make one realize an even better
implementation is possible. But then, I really like duck-typing...

One problem in general is that there is no protocol about what operations are implemented in a numpy ndarray equivalent way in those ducks, i.e. if they quack in a compatible way.

One small example, pandas standard deviation, std, used by default ddof=1, and didn't have an option to override it instead of using ddof=0 that numpy uses. So even though we could call a std method of the ducks, the t-test results would be a bit different and visibly different in small samples depending on the type of the data. A possible alternative would be to compute std from scratch and forgo the available function or method.

I tried once in the scipy.zscore function to be agnostic about the type and not use asarray, it's a simple operation but still it required special handling of numpy matrices because it preserves the dimension in reduce operations. After more than a few lines it is difficult to keep track of what type is no used.

Another subclass that is often broken in default code are masked arrays because asarray throws away the mask.
But asanyarray wouldn't work always either because the mask needs code for handling the masked values. For example scipy.stats ended up with separate functions for masked arrays.

Josef

 

-- Marten
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion