![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
Hi All, I will be running the NumPy sprint at Scipy 2017 and I'm trying to put together a suitable list of things to sprint on. In my experience, sprinting on NumPy is hard, enhancements generally need lengthy review and even finding and doing simple bug fixes can take time. What I have in mind at this point, apart from what might be a getting started tutorial, could mostly be classified as janitorial work. 1. Triage issues and close those that no longer apply. This is mind numbing work, but it has been almost three years since the last pass. 2. Move the contents of `numpy/doc` into `doc/source` and make them normal *.rst files. 3. Convert the doctest in `numpy/lib/tests/test_polynomial.py` to regular tests. Might be tricky as it mostly checks print formatting.
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Thu, Jun 29, 2017, at 11:09, Charles R Harris wrote:
Here's a random idea: how about building a NumPy gallery? scikit- {image,learn} has it, and while those projects may have more visual datasets, I can imagine something along the lines of Nicolas Rougier's beautiful book: http://www.labri.fr/perso/nrougier/from-python-to-numpy/ Stéfan
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Thu, Jun 29, 2017 at 12:15 PM, Stefan van der Walt <stefanv@berkeley.edu> wrote:
So that would be added in the numpy <https://github.com/numpy>/numpy.org <https://github.com/numpy/numpy.org> repo? Chuck
![](https://secure.gravatar.com/avatar/da3a0a1942fbdc5ee9a9b8115ac5dae7.jpg?s=120&d=mm&r=g)
Charles R Harris kirjoitti 29.06.2017 klo 20:45:
Or https://scipy-cookbook.readthedocs.io/ ? (maybe minus bitrot and images added :)
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Jun 30, 2017 at 6:50 AM, Pauli Virtanen <pav@iki.fi> wrote:
I'd like the numpy.org one. numpy.org is now incredibly sparse and ugly, a gallery would make it look a lot better. Another idea, from the "deprecate np.matrix" discussion: add numpy documentation describing the preferred way to handle matrices, extolling the virtues of @, and move np.matrix documentation to a deprecated section. Ralf
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
Just a heads-up. There is now a sphinx-gallery plugin. Matplotlib and a few other projects have migrated their docs over to use it. https://sphinx-gallery.readthedocs.io/en/latest/ Cheers! Ben Root On Sat, Jul 1, 2017 at 7:12 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 07/02/2017 10:03 AM, Charles R Harris wrote:
The new doctest runner suggested in the printing thread? This is to ignore whitespace and precision in ndarray output. I can see an argument for distributing it in numpy if it is designed to be specially aware of ndarrays or numpy scalars (eg to test equality between 'wants' and 'got') Allan
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sun, 2017-07-02 at 10:49 -0400, Allan Haldane wrote:
I don't really feel it is very numpy specific or should be under the numpy umbrella (I mean if there is no other spot, I guess it could live on the numpy github page). Its about as numpy specific, as the gallery sphinx extension is probably matplotlib specific.... That doesn't mean that it might not be a good sprint, though :). The question to me is a bit what those who actually go there want from it or do a few people who know numpy/scipy already plan to come? Two years ago, we did not have much of a plan, so it was mostly giving three people or so a bit of a tutorial of how numpy worked internally leading to some bug fixes. One quick idea that might be nice and dives a bit into the C-layer (might be nice if there is no big topic with a few people working on): * Find places that should have the new memory overlap detection and implement it there. If someone who does subclasses/array-likes or so (e.g. like Stefan Hoyer ;)) and is interested, and also we do some teleconferencing/chatting (and I have time).... I might be interested in discussing and possibly trying to develop the new indexer ideas, which I feel are pretty far, but I got stuck on how to get subclasses right. - Sebastian
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Jul 2, 2017 at 9:33 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
I've opened an issue for Pytests <https://github.com/numpy/numpy/issues/9352> and given it a "Scipy2017 Sprint" label. I'd be much obliged if the folks with suggestions here would open other issues and also label them with "Scipy2017 Sprint". Note that these issues are not Scipy 2017 specific, they could be used in other contexts, but I thought is might be useful to collect them in one spot and give them some structure together with suggestions on how to proceed. Ralf, you have made several previous suggestion on bringing over some to the scipy tests to numpy, to include documentation testing. Were there any other tests we should look into? Chuck
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jul 3, 2017 at 7:01 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Better platform test coverage would be a useful topic if someone is willing to work on that. NumPy needs OS X testing enabled on TravisCI, SciPy needs OS X and a 32-bit test (steal from NumPy). And if someone really feels ambitious: replace ATLAS by OpenBLAS in one of the test matrix entries. Ralf
![](https://secure.gravatar.com/avatar/871426dddc1a9f702316c1ca03a33d9b.jpg?s=120&d=mm&r=g)
Note that TravisCI does not yet have official Python support on Mac OS X, https://github.com/travis-ci/travis-ci/issues/2312 I believe it is possible to do anyway by faking it under another setting (e.g. pretend to be a generic language build, and use the system Python or install your own specific version of Python as needed), so that may be worth trying during a sprint. Peter On Wed, Jul 5, 2017 at 10:43 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 10:14 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
That approach has worked reliably for https://github.com/MacPython/numpy-wheels for a while now, so should be straightforward. Ralf
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
And https://travis-ci.org/MacPython/scipy-wheels where we are testing OSX, 64 and 32 bit manylinux builds daily. That didn't catch the recent ndimage error because I'd disabled the 32-bit builds there. Numpy, scipy, and a fairly large number of other projects use https://github.com/matthew-brett/multibuild to set up builds in this way for manylinux, OSX and (with a bit more effort) Windows. Cheers, Matthew
![](https://secure.gravatar.com/avatar/871426dddc1a9f702316c1ca03a33d9b.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for that link - I'm going off topic but the MacPython wiki page goes into more background about how they build wheels for PyPI which I'm very interested to read up on: https://github.com/MacPython/wiki/wiki/Spinning-wheels Peter
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
Lots of good ideas here. It would help if issues were opened for them and flagged with the sprint label. I'll be doing some myself, but I'm not as intimately familiar with some of the topics as the proposers are. <snip> Chuck
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jul 2, 2017 at 8:33 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I am off course very happy to discuss this (online or via teleconference, sadly I won't be at scipy), but to be clear I use array likes, not subclasses. I think Marten van Kerkwijk is the last one who thinks that is still a good idea :).
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, Jul 3, 2017 at 4:27 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Indeed -- I thought the community more or less had decided that duck-typing was THE way to make something that could be plugged in where a numpy array is expected. Along those lines, there was some discussion of having a set of utilities (or maybe eve3n an ABC?) that would make it easier to create a ndarray-like object. That is, the boilerplate needed for multi-dimensional indexing and slicing, etc... That could be a nice little sprint-able project. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 10:40 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Indeed. Let me highlight a few mixins <https://github.com/pydata/xarray/blob/6a20f917041abf53bcb35e210d59f5b3312110...> that I wrote for xarray that might be more broadly useful. The challenge here is that there are quite a few different meanings to "ndarray-like", so mixins really need to be mix-and-match-able. But at least defining a base list of methods to implement/override would be useful. In NumPy, this could go along with NDArrayOperatorsMixins in numpy/lib/mixins.py <https://github.com/numpy/numpy/blob/14cd918c651d72f4c2a8681093e114f01d5bdc36...>
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, Jul 6, 2017 at 4:42 AM, Ben Rowland <bennyrowland@mac.com> wrote:
Writing such docs (especially to explain how to write array-like objects that aren't subclasses) would be another good topic for the sprint ;). But more seriously: numpy.ndarray subclasses are supported, but inherently error prone, because we don't have a well defined subclassing API. As Martin will attest, this means seemingly harmless internal refactoring in NumPy has a tendency to break downstream subclasses, which often unintentionally end up relying on untested implementation details. This is particularly problematic when subclasses are implemented in a different code-base, as is the case for user subclasses of numpy.ndarray. Due to diligent testing efforts, we often (but not always) catch these issues before making a release, but the process is inherently error prone. Writing NumPy functionality in a manner that is robust to all possible subclassing approaches turns out to be very difficult (nearly impossible). This is actually a classic OOP problem, e.g., see https://en.wikipedia.org/wiki/Composition_over_inheritance
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 11:05 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
At a quick glance, that is exactly the kind of ting I had in mind. The challenge here is that there are quite a few different meanings to
"ndarray-like", so mixins really need to be mix-and-match-able.
exactly!
But at least defining a base list of methods to implement/override would be useful.
With sample implementations, even... at last of parts of it -- I'm thinking things like parsing out the indexes/slices in __getitem__ -- that sort of thing.
Yes! I had no idea that existed. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, I doubt I'm really the last one thinking ndarray subclassing is a good idea, but as that was stated, I feel I should at least pipe in. It seems to me there is both a perceived problem -- with the two subclasses that numpy provides -- `matrix` and `MaskedArray` -- both being problematic in ways that seem to me to have very little to do with subclassing being a bad idea, and a real one following from the fact that numpy was written at a time when python's inheritance system was not as well developed as it is now. Though based on my experience with Quantity, I'd also argue that the more annoying problems are not so much with `ndarray` itself, but rather with the helper functions. Ufuncs were not so bad -- they really just needed a better override mechanism, which __array_ufunc__ now provides -- but for quite a few of the other functions subclassing was clearly an afterthought. Indeed, `MaskedArray` provides a nice example of this, with its many special `np.ma.<function>` routines, providing huge duplication and thus lots of duplicated bugs (which Eric has been patiently fixing...). Indeed, `MaskedArray` is also a much better example than ndarrat of a class that is really hard to subclass (even though, conceptually, it should be a far easier one). All that said, duck-type arrays make a lot of sense, and e.g. the slicing and shaping methods are easily emulated, especially if one's underlying data are stored in `ndarray`. For astropy's version of a relevant mixin, see http://docs.astropy.org/en/stable/api/astropy.utils.misc.ShapedLikeNDArray.h... All the best, Marten
![](https://secure.gravatar.com/avatar/6401b8425eed08fcbaffffeeaceac894.jpg?s=120&d=mm&r=g)
On Fri, Jul 7, 2017 at 4:27 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
My biggest problem with subclassing as it exists now is that they don't survive the first encounter with np.asarray (or np.array). So much code written to work with numpy uses that as a bandaid (for e.g. handling lists) that in my experience it's 50/50 whether passing a subclass to a function will actually behave as expected--even if there's no good reason it shouldn't. Ryan -- Ryan May
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ryan, Indeed, the liberal use of `np.asarray` is one of the main reason the helper routines are relatively annoying. Of course, that is not an argument for using duck-types over subclasses: those wouldn't even survive `asanyarray` (which many numpy routines now have moved to). All the best, Marten
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Fri, Jul 7, 2017 at 6:42 PM, Ryan May <rmay31@gmail.com> wrote:
as a downstream developer: The problem is that we cannot trust any array subclass or anything that pretends to be like an array. Even asarray is letting already too many things go through. We would need an indication or guarantee for the behavior to quack in the correct way, otherwise it is very difficult to write code that would work for various subclasses. (even in the simplest case, writing code that works for matrix and arrays beyond a few lines is getting difficult.) scipy.stats.mstats is largely not code duplication, it needs to handle the mask (although the nan versions in scipy.stats are catching up). Josef
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Thu, Jun 29, 2017, at 11:09, Charles R Harris wrote:
Here's a random idea: how about building a NumPy gallery? scikit- {image,learn} has it, and while those projects may have more visual datasets, I can imagine something along the lines of Nicolas Rougier's beautiful book: http://www.labri.fr/perso/nrougier/from-python-to-numpy/ Stéfan
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Thu, Jun 29, 2017 at 12:15 PM, Stefan van der Walt <stefanv@berkeley.edu> wrote:
So that would be added in the numpy <https://github.com/numpy>/numpy.org <https://github.com/numpy/numpy.org> repo? Chuck
![](https://secure.gravatar.com/avatar/da3a0a1942fbdc5ee9a9b8115ac5dae7.jpg?s=120&d=mm&r=g)
Charles R Harris kirjoitti 29.06.2017 klo 20:45:
Or https://scipy-cookbook.readthedocs.io/ ? (maybe minus bitrot and images added :)
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Jun 30, 2017 at 6:50 AM, Pauli Virtanen <pav@iki.fi> wrote:
I'd like the numpy.org one. numpy.org is now incredibly sparse and ugly, a gallery would make it look a lot better. Another idea, from the "deprecate np.matrix" discussion: add numpy documentation describing the preferred way to handle matrices, extolling the virtues of @, and move np.matrix documentation to a deprecated section. Ralf
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
Just a heads-up. There is now a sphinx-gallery plugin. Matplotlib and a few other projects have migrated their docs over to use it. https://sphinx-gallery.readthedocs.io/en/latest/ Cheers! Ben Root On Sat, Jul 1, 2017 at 7:12 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 07/02/2017 10:03 AM, Charles R Harris wrote:
The new doctest runner suggested in the printing thread? This is to ignore whitespace and precision in ndarray output. I can see an argument for distributing it in numpy if it is designed to be specially aware of ndarrays or numpy scalars (eg to test equality between 'wants' and 'got') Allan
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sun, 2017-07-02 at 10:49 -0400, Allan Haldane wrote:
I don't really feel it is very numpy specific or should be under the numpy umbrella (I mean if there is no other spot, I guess it could live on the numpy github page). Its about as numpy specific, as the gallery sphinx extension is probably matplotlib specific.... That doesn't mean that it might not be a good sprint, though :). The question to me is a bit what those who actually go there want from it or do a few people who know numpy/scipy already plan to come? Two years ago, we did not have much of a plan, so it was mostly giving three people or so a bit of a tutorial of how numpy worked internally leading to some bug fixes. One quick idea that might be nice and dives a bit into the C-layer (might be nice if there is no big topic with a few people working on): * Find places that should have the new memory overlap detection and implement it there. If someone who does subclasses/array-likes or so (e.g. like Stefan Hoyer ;)) and is interested, and also we do some teleconferencing/chatting (and I have time).... I might be interested in discussing and possibly trying to develop the new indexer ideas, which I feel are pretty far, but I got stuck on how to get subclasses right. - Sebastian
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Jul 2, 2017 at 9:33 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
I've opened an issue for Pytests <https://github.com/numpy/numpy/issues/9352> and given it a "Scipy2017 Sprint" label. I'd be much obliged if the folks with suggestions here would open other issues and also label them with "Scipy2017 Sprint". Note that these issues are not Scipy 2017 specific, they could be used in other contexts, but I thought is might be useful to collect them in one spot and give them some structure together with suggestions on how to proceed. Ralf, you have made several previous suggestion on bringing over some to the scipy tests to numpy, to include documentation testing. Were there any other tests we should look into? Chuck
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jul 3, 2017 at 7:01 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Better platform test coverage would be a useful topic if someone is willing to work on that. NumPy needs OS X testing enabled on TravisCI, SciPy needs OS X and a 32-bit test (steal from NumPy). And if someone really feels ambitious: replace ATLAS by OpenBLAS in one of the test matrix entries. Ralf
![](https://secure.gravatar.com/avatar/871426dddc1a9f702316c1ca03a33d9b.jpg?s=120&d=mm&r=g)
Note that TravisCI does not yet have official Python support on Mac OS X, https://github.com/travis-ci/travis-ci/issues/2312 I believe it is possible to do anyway by faking it under another setting (e.g. pretend to be a generic language build, and use the system Python or install your own specific version of Python as needed), so that may be worth trying during a sprint. Peter On Wed, Jul 5, 2017 at 10:43 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 10:14 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
That approach has worked reliably for https://github.com/MacPython/numpy-wheels for a while now, so should be straightforward. Ralf
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
And https://travis-ci.org/MacPython/scipy-wheels where we are testing OSX, 64 and 32 bit manylinux builds daily. That didn't catch the recent ndimage error because I'd disabled the 32-bit builds there. Numpy, scipy, and a fairly large number of other projects use https://github.com/matthew-brett/multibuild to set up builds in this way for manylinux, OSX and (with a bit more effort) Windows. Cheers, Matthew
![](https://secure.gravatar.com/avatar/871426dddc1a9f702316c1ca03a33d9b.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for that link - I'm going off topic but the MacPython wiki page goes into more background about how they build wheels for PyPI which I'm very interested to read up on: https://github.com/MacPython/wiki/wiki/Spinning-wheels Peter
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
Lots of good ideas here. It would help if issues were opened for them and flagged with the sprint label. I'll be doing some myself, but I'm not as intimately familiar with some of the topics as the proposers are. <snip> Chuck
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jul 2, 2017 at 8:33 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I am off course very happy to discuss this (online or via teleconference, sadly I won't be at scipy), but to be clear I use array likes, not subclasses. I think Marten van Kerkwijk is the last one who thinks that is still a good idea :).
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, Jul 3, 2017 at 4:27 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Indeed -- I thought the community more or less had decided that duck-typing was THE way to make something that could be plugged in where a numpy array is expected. Along those lines, there was some discussion of having a set of utilities (or maybe eve3n an ABC?) that would make it easier to create a ndarray-like object. That is, the boilerplate needed for multi-dimensional indexing and slicing, etc... That could be a nice little sprint-able project. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 10:40 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Indeed. Let me highlight a few mixins <https://github.com/pydata/xarray/blob/6a20f917041abf53bcb35e210d59f5b3312110...> that I wrote for xarray that might be more broadly useful. The challenge here is that there are quite a few different meanings to "ndarray-like", so mixins really need to be mix-and-match-able. But at least defining a base list of methods to implement/override would be useful. In NumPy, this could go along with NDArrayOperatorsMixins in numpy/lib/mixins.py <https://github.com/numpy/numpy/blob/14cd918c651d72f4c2a8681093e114f01d5bdc36...>
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, Jul 6, 2017 at 4:42 AM, Ben Rowland <bennyrowland@mac.com> wrote:
Writing such docs (especially to explain how to write array-like objects that aren't subclasses) would be another good topic for the sprint ;). But more seriously: numpy.ndarray subclasses are supported, but inherently error prone, because we don't have a well defined subclassing API. As Martin will attest, this means seemingly harmless internal refactoring in NumPy has a tendency to break downstream subclasses, which often unintentionally end up relying on untested implementation details. This is particularly problematic when subclasses are implemented in a different code-base, as is the case for user subclasses of numpy.ndarray. Due to diligent testing efforts, we often (but not always) catch these issues before making a release, but the process is inherently error prone. Writing NumPy functionality in a manner that is robust to all possible subclassing approaches turns out to be very difficult (nearly impossible). This is actually a classic OOP problem, e.g., see https://en.wikipedia.org/wiki/Composition_over_inheritance
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Wed, Jul 5, 2017 at 11:05 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
At a quick glance, that is exactly the kind of ting I had in mind. The challenge here is that there are quite a few different meanings to
"ndarray-like", so mixins really need to be mix-and-match-able.
exactly!
But at least defining a base list of methods to implement/override would be useful.
With sample implementations, even... at last of parts of it -- I'm thinking things like parsing out the indexes/slices in __getitem__ -- that sort of thing.
Yes! I had no idea that existed. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi All, I doubt I'm really the last one thinking ndarray subclassing is a good idea, but as that was stated, I feel I should at least pipe in. It seems to me there is both a perceived problem -- with the two subclasses that numpy provides -- `matrix` and `MaskedArray` -- both being problematic in ways that seem to me to have very little to do with subclassing being a bad idea, and a real one following from the fact that numpy was written at a time when python's inheritance system was not as well developed as it is now. Though based on my experience with Quantity, I'd also argue that the more annoying problems are not so much with `ndarray` itself, but rather with the helper functions. Ufuncs were not so bad -- they really just needed a better override mechanism, which __array_ufunc__ now provides -- but for quite a few of the other functions subclassing was clearly an afterthought. Indeed, `MaskedArray` provides a nice example of this, with its many special `np.ma.<function>` routines, providing huge duplication and thus lots of duplicated bugs (which Eric has been patiently fixing...). Indeed, `MaskedArray` is also a much better example than ndarrat of a class that is really hard to subclass (even though, conceptually, it should be a far easier one). All that said, duck-type arrays make a lot of sense, and e.g. the slicing and shaping methods are easily emulated, especially if one's underlying data are stored in `ndarray`. For astropy's version of a relevant mixin, see http://docs.astropy.org/en/stable/api/astropy.utils.misc.ShapedLikeNDArray.h... All the best, Marten
![](https://secure.gravatar.com/avatar/6401b8425eed08fcbaffffeeaceac894.jpg?s=120&d=mm&r=g)
On Fri, Jul 7, 2017 at 4:27 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
My biggest problem with subclassing as it exists now is that they don't survive the first encounter with np.asarray (or np.array). So much code written to work with numpy uses that as a bandaid (for e.g. handling lists) that in my experience it's 50/50 whether passing a subclass to a function will actually behave as expected--even if there's no good reason it shouldn't. Ryan -- Ryan May
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ryan, Indeed, the liberal use of `np.asarray` is one of the main reason the helper routines are relatively annoying. Of course, that is not an argument for using duck-types over subclasses: those wouldn't even survive `asanyarray` (which many numpy routines now have moved to). All the best, Marten
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Fri, Jul 7, 2017 at 6:42 PM, Ryan May <rmay31@gmail.com> wrote:
as a downstream developer: The problem is that we cannot trust any array subclass or anything that pretends to be like an array. Even asarray is letting already too many things go through. We would need an indication or guarantee for the behavior to quack in the correct way, otherwise it is very difficult to write code that would work for various subclasses. (even in the simplest case, writing code that works for matrix and arrays beyond a few lines is getting difficult.) scipy.stats.mstats is largely not code duplication, it needs to handle the mask (although the nan versions in scipy.stats are catching up). Josef
participants (17)
-
Allan Haldane
-
Ben Rowland
-
Benjamin Root
-
Charles R Harris
-
Chris Barker
-
David Cournapeau
-
Evgeni Burovski
-
josef.pktd@gmail.com
-
Marten van Kerkwijk
-
Matthew Brett
-
Pauli Virtanen
-
Peter Cock
-
Ralf Gommers
-
Ryan May
-
Sebastian Berg
-
Stefan van der Walt
-
Stephan Hoyer