[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

Sebastian Berg sebastian at sipsolutions.net
Tue Jun 11 11:52:25 EDT 2019


On Tue, 2019-06-11 at 10:56 +0200, Ralf Gommers wrote:
> 
> 
> On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
> > Hi All,
> > 
> > In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> > to use the new `where` argument for reductions to implement
> > `nansum`, etc., using simplifications that boil down to
> > `np.sum(..., where=~isnan(...))`.
> > 
> > A problem that occurs is that `np.sum` will use a `.sum` method if
> > that is present, and for matrix, the `.sum` method does not take a
> > `where` argument. Since the `where` argument has been introduced
> > only recently, similar problems may happen for array mimics that
> > implement their own `.sum` method.
> > 
> > The question now is what to do, with options being:
> > 1. Let's stick with the existing implementation; the speed-up is
> > not that great anyway.
> > 2. Use try/except around the new implementation and use the old one
> > if it fails.
> > 3. As (2), but emit a deprecation warning. This will help array
> > mimics, but not matrix (unless someone makes a PR; would we even
> > accept it?);
> > 4. Use the new implementation. `matrix` should be gone anyway and
> > array mimics can either update their `.sum()` method or override
> > `np.nansum` with `__array_function__`.
> > 
> > Personally, I'd prefer (4), but realize that (3) is probably the
> > more safer approach, even if it is really annoying to still be
> > taking into account matrix deficiencies.
> > 
> 
> Honestly, I agree with Tyler's assessment when he closed the PR: 
> "there's not much evidence of remarkable performance improvement, and
> many of the other nan functions would have more complicated
> implementation details that are unlikely to make the relative
> performance much better. It doesn't seem to be a priority either."
> 
> The code also got more rather than less complex. So why do this? Yes
> you give a reason why it may help duck arrays, but it also breaks
> things apparently.

There could probably be reasons for this, since replacing with 0 may
not work always for object arrays. But that is rather hypothetical
(e.g. using add as string concatenation).

Calling `np.add.reduce` instead of the sum method could be slightly
more compatible, but could have its own set of issue.
But I suppose it is true, unless we plan on adding specialized `where`
loops or a nanadd ufunc itself the gain is likely so small that it it
arguably may make more sense to defer this until there is an an actual
benefit (and less pain due to adoption of the new protocols).

Best,

Sebastian

> 
> Re matrix: it's not deprecated yet and is still fairly widely used,
> so yes we'd accept a PR and no we should not break it gratuitously.
> 
> Cheers,
> Ralf
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/f0c61cec/attachment-0001.sig>


More information about the NumPy-Discussion mailing list