[Numpy-discussion] Separating out the maskna code
Travis Oliphant
travis at continuum.io
Sun May 20 01:08:27 EDT 2012
Wow, Nathaniel. This looks like a nice piece of tedious work. I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8
Of course, first I would want to hear from Mark, to hear his comments about what was removed.
-Travis
On May 19, 2012, at 3:54 PM, Nathaniel Smith wrote:
> Hi all,
>
> Since Mark's original missingdata branch made so many changes, I
> figured it would be a useful exercise to figure out what code in
> master is actually related to masked arrays, and which isn't. The
> easiest way seemed to be to delete the new fields, then keep removing
> any code that depended on them until everything built again, which
> gives this:
> https://github.com/njsmith/numpy/commits/separate-maskna
>
> Possible uses:
> - Use the diff as a checklist for going through to change the API
> - Use the diff as a checklist for adding an experimental-API-only flag
> - Merge into master, then use as a reference to cherrypick the
> pieces that we want to save (there is some questionable stuff in here
> -- e.g. copy/paste code, hand-wavy half-implementation of "multi-NA"
> support, and PyArray_ReduceWrapper, see below)
> - Merge into master and then immediately 'git revert' the changes on
> a branch, which would effectively 'move it aside' so we can release
> 1.7 while Mark and Travis to continue hacking on it at their own pace
>
> This is a huge patch, but I was pretty careful not to cause any
> accidental non-maskna-related regressions. The whole PyArray_Diagonal
> thread actually happened because I noticed that test_maskna had the
> only tests for np.diagonal, so I wanted to write proper ones that
> would be independent of the maskna code. Also I ran the following
> tests:
> - numpy.test("full")
> - scipy v0.10.1, test("full")
> - matplotlib current master
> - pandas v0.7.3
> and everything looks good.
>
> The other complicated thing to handle was the new
> PyArray_ReduceWrapper function that was added to the public multiarray
> API. Conceptually, this function has only a glancing relationship to
> masked arrays per se. But, it has its own problems, and I don't think
> it should be exposed in 1.7. Partly because its signature necessarily
> changes depending on whether maskna support exists. Partly because
> it's just kind of ugly (it has 15 arguments, after I removed some[1]).
> But mostly because it gives us two independent generic implementations
> of functions that do array->scalar operations, which seems like
> something we absolutely don't want to commit to supporting
> indefinitely. And the "generalized ufunc" alternative seems a lot more
> promising. So, that branch also has a followup patch that does the
> necessary hacking to get it out of the public API.
>
> Unfortunately, this patch is dependent on the previous one -- I'm not
> sure how to untangle PyArray_ReduceWrapper while keeping the maskna
> support in, which makes the "global experimental flag" idea for 1.7
> hard to implement (assuming that others agree about
> PyArray_ReduceWrapper being unready for public exposure).
>
> At this point, it might easiest to just merge this branch to master,
> immediately revert it on a branch for Mark and Travis to work on, and
> then release 1.7.
>
> Ralf, IIUC merging this and my other outstanding PRs would leave the
> datetime issues on python3/win32 as the only outstanding blocker?
>
> - N
>
> [1] http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html
> , number 11
>
> ------------ Commit messages follow for reference/discussion
>
> The main change is commit 4c16813c23b20:
> Remove maskna API from ndarray, and all (and only) the code supporting it
>
> The original masked-NA-NEP branch contained a large number of changes
> in addition to the core NA support. For example:
> - ufunc.__call__ support for where= argument
> - nditer support for arbitrary masks (in support of where=)
> - ufunc.reduce support for simultaneous reduction over multiple axes
> - a new "array assignment API"
> - ndarray.diagonal() returning a view in all cases
> - bug-fixes in __array_priority__ handling
> - datetime test changes
> etc. There's no consensus yet on what should be done with the
> maskna-related part of this branch, but the rest is generally useful
> and uncontroversial, so the goal of this branch is to identify exactly
> which code changes are involved in maskna support.
>
> The basic strategy used to create this patch was:
> - Remove the new masking-related fields from ndarray, so no arrays
> are masked
> - Go through and remove all the code that this makes
> dead/inaccessible/irrelevant, in a largely mechanical fashion. So
> for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that
> whole block was obviously just dead code if no arrays have masks,
> and I removed it. Likewise for function arguments like skipna that
> are useless if there aren't any NAs to skip.
>
> This changed the signature of a number of functions that were newly
> exposed in the numpy public API. I've removed all such functions from
> the public API, since releasing them with the NA-less signature in 1.7
> would create pointless compatibility hassles later if and when we add
> back the NA-related functionality. Most such functions are removed by
> this commit; the exception is PyArray_ReduceWrapper, which requires
> more extensive surgery, and will be handled in followup commits.
>
> I also removed the new ndarray.setasflat method. Reason: a comment
> noted that the only reason this was added was to allow easier testing
> of one branch of PyArray_CopyAsFlat. That branch is now the main
> branch, so that isn't an issue. Nonetheless this function is arguably
> useful, so perhaps it should have remained, but I judged that since
> numpy's API is already hairier than we would like, it's not a good
> idea to add extra hair "just in case". (Also AFAICT the test for this
> method in test_maskna was actually incorrect, as noted here:
> https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py
> so I'm not confident that it ever worked in master, though I haven't
> had a chance to follow-up on this.)
>
> I also removed numpy.count_reduce_items, since without skipna it
> became trivial.
>
> I believe that these are the only exceptions to the "remove dead code"
> strategy.
>
> The ReduceWrapper untangling is a001fb29c9:
> Remove PyArray_ReduceWrapper from public API
>
> There are two reasons to want to keep PyArray_ReduceWrapper out of the
> public multiarray API:
> - Its signature is likely to change if/when masked arrays are added
> - It is essentially a wrapper for array->scalar transformations
> (*not* just reductions as its name implies -- the whole reason it
> is in multiarray.so in the first place is to support count_nonzero,
> which is not actually a reduction!). It provides some nice
> conveniences (like making it easy to apply such functions to
> multiple axes simultaneously), but, we already have a general
> mechanism for writing array->scalar transformations -- generalized
> ufuncs. We do not want to have two independent, redundant
> implementations of this functionality, one in multiarray and one in
> umath! So in the long run we should add these nice features to the
> generalized ufunc machinery. And in the short run, we shouldn't add
> it to the public API and commit ourselves to supporting it.
>
> However, simply removing it from numpy_api.py is not easy, because
> this code was used in both multiarray and umath. This commit:
> - Moves ReduceWrapper and supporting code to umath/, and makes
> appropriate changes (e.g. renaming it to PyUFunc_ReduceWrapper and
> cleaning up the header files).
> - Reverts numpy.count_nonzero to its previous implementation, so that
> it loses the new axis= and keepdims= arguments. This is
> unfortunate, but this change isn't so urgent that it's worth tying
> our APIs in knots forever. (Perhaps in the future it can become a
> generalized ufunc.)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list