Hi All, This post is to open a discussion of the future of ufuncs. There are two contradictory ideas that have floated about regarding ufuncs evolution. One is to generalize ufuncs to operate on buffers, essentially separating them from their current entanglement with ndarrays. The other is to accept that they are fundamentally part of the ndarray universe and move them into the multiarray module, thus avoiding the odd overloading of functions in the multiarray module. The first has been a long time proposal that I once thought sounded good, but I've come to prefer the second. That change of mind was driven by the resulting code simplification and the removal of a dependence on a Python feature, buffers, that we cannot easily modify to adapt to changing needs and new dtypes. Because I'd like to move the ufuncs, if we decide to move them, sometime after NumPy 1.14 is released, now seems a good time to decide the issue. Thoughts? Chuck
On Sun, 2017-05-28 at 14:53 -0600, Charles R Harris wrote:
Hi All, This post is to open a discussion of the future of ufuncs. There are two contradictory ideas that have floated about regarding ufuncs evolution. One is to generalize ufuncs to operate on buffers, essentially separating them from their current entanglement with ndarrays. The other is to accept that they are fundamentally part of the ndarray universe and move them into the multiarray module, thus avoiding the odd overloading of functions in the multiarray module. The first has been a long time proposal that I once thought sounded good, but I've come to prefer the second. That change of mind was driven by the resulting code simplification and the removal of a dependence on a Python feature, buffers, that we cannot easily modify to adapt to changing needs and new dtypes. Because I'd like to move the ufuncs, if we decide to move them, sometime after NumPy 1.14 is released, now seems a good time to decide the issue. Thoughts?
I did not think about it much. But I agree that the dtypes are probably the biggest issue, also I am not sure anymore if there is much of a gain on having ufuncs work on buffers in any case? The dtype thing goes a bit back to ideas like the datashape things and trying to make the dtypes somewhat separate from numpy? Though I doubt I would want to make that an explicit goal. I wonder how much of the C-loops and type resolving we could/should expose? What I mean is that ufuncs are: * type resolving (somewhat ufunc specific) * outer loops (normal, reduce, etc.) using nditer (buffering) * inner 1d loops It is a bit more complicating, but just wondering if it might make sense to try and expose the individual ufunc things (type resolving and 1d loop) but not all the outer loop nditer setup which is ndarray specific in any case (honestly, I am not sure it is entirely possible it is already exposed). - Sebastian
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi Chuck, Like Sebastian, I wonder a little about what level you are talking about. Presumably, it is the actual implementation of the ufunc? I.e., this is not about the upper logic that decides which `__array_ufunc__` to call, etc. If so, I agree with you that it would seem to make most sense to move the implementation to `multiarray`; the current structure certainly is a major hurdle to understanding how things work! Indeed, I guess in terms of my earlier suggestion to make much of a ufunc happen in `ndarray.__array_ufunc__`, one could seem the type resolution and iteration happening there. If one were to expose the inner loops, anyone working with buffers could then use the ufuncs by defining their own __array_ufunc__. All the best, Marten
On Mon, May 29, 2017 at 12:32 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi Chuck,
Like Sebastian, I wonder a little about what level you are talking about. Presumably, it is the actual implementation of the ufunc? I.e., this is not about the upper logic that decides which `__array_ufunc__` to call, etc.
If so, I agree with you that it would seem to make most sense to move the implementation to `multiarray`; the current structure certainly is a major hurdle to understanding how things work!
Indeed, I guess in terms of my earlier suggestion to make much of a ufunc happen in `ndarray.__array_ufunc__`, one could seem the type resolution and iteration happening there. If one were to expose the inner loops, anyone working with buffers could then use the ufuncs by defining their own __array_ufunc__.
The idea of separating ufuncs from ndarray was put forward many years ago, maybe five or six. What I seek here is a record that we have given up on that ambition, so do not need to take it into consideration in the future. In particular, we can feel free to couple ufuncs even more tightly with ndarray. Chuck
On Mon, May 29, 2017 at 1:51 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 29, 2017 at 12:32 PM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Hi Chuck,
Like Sebastian, I wonder a little about what level you are talking about. Presumably, it is the actual implementation of the ufunc? I.e., this is not about the upper logic that decides which `__array_ufunc__` to call, etc.
If so, I agree with you that it would seem to make most sense to move the implementation to `multiarray`; the current structure certainly is a major hurdle to understanding how things work!
Indeed, I guess in terms of my earlier suggestion to make much of a ufunc happen in `ndarray.__array_ufunc__`, one could seem the type resolution and iteration happening there. If one were to expose the inner loops, anyone working with buffers could then use the ufuncs by defining their own __array_ufunc__.
The idea of separating ufuncs from ndarray was put forward many years ago, maybe five or six. What I seek here is a record that we have given up on that ambition, so do not need to take it into consideration in the future. In particular, we can feel free to couple ufuncs even more tightly with ndarray.
I think we do want to separate ufuncs from ndarray semantically: it should be possible to use ufuncs on sparse arrays, dask arrays, etc. etc. But I don't think that altering ufuncs to work directly on buffer/memoryview objects, or shipping them as a separate package from the rest of numpy, is a useful step towards this goal. Right now, handling buffers/memoryviews is easy: one can trivially convert between them and ndarray without making any copies. I don't know of any interesting problems that are blocked because ufuncs work on ndarrays instead of buffer/memoryview objects. The interesting problems are where there's a fundamentally different storage strategy involved, like sparse/dask/... arrays. And similarly, I don't see what problems are solved by splitting them out for building or distribution. OTOH, trying to accomplish either of these things definitely has a cost in terms of churn, complexity, double the workload for release-management, etc. Even the current split between the multiarray and umath modules causes problems all the time. It's mostly boring problems like having little utility functions that are needed in both places but awkward to share, or problems caused by the complicated machinery needed to let them interact properly (set_numeric_ops and all that) – this doesn't seem like stuff that's adding any value. Plus, there's a major problem that buffers/memoryviews don't have any way to represent all the dtypes we currently support (e.g. datetime64) and don't have any way to add new ones, and the only way to fix this would be to write a PEP, shepherding patches through python-dev, waiting for the next python major release and then dropping support for all older Python releases. None of this is going to happen soon; probably we should plan on the assumption that it will never happen. So I don't see how this could work at all. So my vote is for merging the multiarray and umath code bases together, and then taking advantage of the resulting flexibility to refactor the internals to provide cleanly separated interfaces at the API level. -n -- Nathaniel J. Smith -- https://vorpus.org
participants (4)
-
Charles R Harris
-
Marten van Kerkwijk
-
Nathaniel Smith
-
Sebastian Berg