A roadmap for NumPy - longer term planning
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts. I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin. Eventually it could become a NEP or formalized in another way. Matti
On Thu, May 31, 2018 at 4:50 PM, Matti Picus <matti.picus@gmail.com> wrote:
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts.
I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin.
Thanks for writing that up!
Eventually it could become a NEP or formalized in another way.
A NEP doesn't sound quite right, but moving from wiki to somewhere more formal and with more control over the contents (e.g. numpy.org or in the docs) would be useful. A roadmap could/should also include things like required effort, funding and knowledge/people required. A couple of comments on the content: - a mention of stability or backwards compatibility goals under philosophy would be useful - the "Could potentially be split out into separate packages..." should be removed I think - the maskedarray one was already rejected, and the rest are similarly unhelpful. - "internal refactorings": MaskedArray yes, but the other ones no. numpy.distutils and f2py are very hard to test, a big refactor pretty much guarantees breakage. there's also not much need for refactoring, because those things are not coupled to the numpy.core internals. numpy.financial is simply uninteresting - we wish it wasn't there but it is, so now it simply stays where it is. - One item that I think is missing under "New functionality" is runtime switching of backend for numpy.linalg (IIRC discussed on this list before) and numpy.random (MKL devs are interested in this). Cheers, Ralf
Hi Matti, Thanks for sharing the roadmap. Overall, it looks very nice. A practical question is on whether you want input via the mailing list, or should one just edit the wiki and add questions or so? As the roadmap mentioned interaction with python proper (and a possible PEP): one thing that always slightly annoyed me is that numpy math is way slower for scalars than python math - and duplicates all the function names. It would seem to make sense to allow python's math module to be overridden for non-python input, including arrays. That could be another PEP... All the best, Marten
On Fri, Jun 1, 2018 at 4:43 AM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
one thing that always slightly annoyed me is that numpy math is way slower for scalars than python math
numpy is also quite a bit slower than raw python for math with (very) small arrays: In [31]: % timeit t2 = (t[0] * 10, t[1] * 10) 162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [32]: a Out[32]: array([ 3.4, 5.6]) In [33]: % timeit a2 = a * 10 941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) (I often want to so this sort of thing, not for performance, but for ease of computation -- say you have 2 or three coordinates that represent a point -- it's really nice to be able to scale or shift with array operations, rather than all that indexing -- but it is pretty slo with numpy. I've wondered if numpy could be optimized for small 1D arrays, and maybe even 2d arrays with a small fixed second dimension (N x 2, N x 3), by special-casing / short-cutting those cases. It would require some careful profiling to see if it would help, but it sure seems possible. And maybe scalars could be fit into the same system. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Jun 1, 2018 at 9:46 AM, Chris Barker <chris.barker@noaa.gov> wrote:
numpy is also quite a bit slower than raw python for math with (very) small arrays:
doing a bit more experimentation, the advantage is with pure python for over 10 elements (I got bored...). but I noticed that the time for numpy computation is pretty much constant for 2 up to around 100 elements. Which implies that the bulk of the issue is with "startup" costs, rather than fancy indexing or anything like that. so maybe a short cut wouldn't be helpful. Note if you use a list comp (the pythonic translation of an array operation) thecrossover point is about 15 elements (in my tests, on my machine...) In [90]: % timeit t2 = [x * 10 for x in t] 920 ns ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) -CHB
In [31]: % timeit t2 = (t[0] * 10, t[1] * 10) 162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [32]: a Out[32]: array([ 3.4, 5.6])
In [33]: % timeit a2 = a * 10 941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
(I often want to so this sort of thing, not for performance, but for ease of computation -- say you have 2 or three coordinates that represent a point -- it's really nice to be able to scale or shift with array operations, rather than all that indexing -- but it is pretty slo with numpy.
I've wondered if numpy could be optimized for small 1D arrays, and maybe even 2d arrays with a small fixed second dimension (N x 2, N x 3), by special-casing / short-cutting those cases.
It would require some careful profiling to see if it would help, but it sure seems possible.
And maybe scalars could be fit into the same system.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi Ralf, On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
- "internal refactorings": MaskedArray yes, but the other ones no. numpy.distutils and f2py are very hard to test, a big refactor pretty much guarantees breakage. there's also not much need for refactoring, because those things are not coupled to the numpy.core internals. numpy.financial is simply uninteresting - we wish it wasn't there but it is, so now it simply stays where it is.
I want to clarify that in the current notes we put down ideas that prompted active discussion, even if they weren't necessarily feasible. I feel it is important to keep the conversation open to run its course until we have a good understanding of the various issues at hand. You may find that, in person, people are more willing to admit to their support for some "heretical" ideas than they are here on the list. E.g., you say that the financial functions "now simply stay", but that promises a future of a NumPy that never shrinks, while there is certainly some support for allowing NumPy to contract so that we can release maintenance burden and allow development of other core areas that have been neglected for a long time. You will *always* have small, vocal proponents of any specific piece of functionality; that doesn't necessarily mean that such functionality contributes to the health of a project as a whole. So, I gently urge us carefully reconsider the narrative that nothing can change/be removed, and evaluate each suggestion carefully, not weighing only the very evident negatives but also the longer term positives. Best regards, Stéfan
I would love to see gufuncs become more general. Specifically I would like an optional prologue and epilogue function. The prologue could potentially 1) inspect parameterized dtypes 2) kwargs 3) set non-trivial output array sizes 4) initialize data structures 5) defer processing to other functions (BLAS). The epilogue function could do any clean up of data structures. On Fri, Jun 1, 2018 at 12:57 PM, Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Ralf,
On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
- "internal refactorings": MaskedArray yes, but the other ones no. numpy.distutils and f2py are very hard to test, a big refactor pretty much guarantees breakage. there's also not much need for refactoring, because those things are not coupled to the numpy.core internals. numpy.financial is simply uninteresting - we wish it wasn't there but it is, so now it simply stays where it is.
I want to clarify that in the current notes we put down ideas that prompted active discussion, even if they weren't necessarily feasible. I feel it is important to keep the conversation open to run its course until we have a good understanding of the various issues at hand.
You may find that, in person, people are more willing to admit to their support for some "heretical" ideas than they are here on the list.
E.g., you say that the financial functions "now simply stay", but that promises a future of a NumPy that never shrinks, while there is certainly some support for allowing NumPy to contract so that we can release maintenance burden and allow development of other core areas that have been neglected for a long time.
You will *always* have small, vocal proponents of any specific piece of functionality; that doesn't necessarily mean that such functionality contributes to the health of a project as a whole.
So, I gently urge us carefully reconsider the narrative that nothing can change/be removed, and evaluate each suggestion carefully, not weighing only the very evident negatives but also the longer term positives.
Best regards, Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
While we are in the crazy wish-list: having dtypes that are universal enough for pandas to use them and export their columns with them would be my crazy wish. I hope that it would help adding more uniform support for things like categorical variables in the pydata ecosystem. Gaël
On Fri, Jun 1, 2018 at 9:57 AM, Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Ralf,
On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
- "internal refactorings": MaskedArray yes, but the other ones no. numpy.distutils and f2py are very hard to test, a big refactor pretty much guarantees breakage. there's also not much need for refactoring, because those things are not coupled to the numpy.core internals. numpy.financial is simply uninteresting - we wish it wasn't there but it is, so now it simply stays where it is.
I want to clarify that in the current notes we put down ideas that prompted active discussion, even if they weren't necessarily feasible. I feel it is important to keep the conversation open to run its course until we have a good understanding of the various issues at hand.
You may find that, in person, people are more willing to admit to their support for some "heretical" ideas than they are here on the list.
Thanks Stefan, good points. I totally agree that anything can be discussed.
E.g., you say that the financial functions "now simply stay", but that promises a future of a NumPy that never shrinks, while there is certainly some support for allowing NumPy to contract so that we can release maintenance burden and allow development of other core areas that have been neglected for a long time.
You will *always* have small, vocal proponents of any specific piece of functionality; that doesn't necessarily mean that such functionality contributes to the health of a project as a whole.
So, I gently urge us carefully reconsider the narrative that nothing can change/be removed, and evaluate each suggestion carefully, not weighing only the very evident negatives but also the longer term positives.
I don't think there's such a narrative - e.g. the removal of np.matrix that we've planned and getting rid of MaskedArray at some point once we have a better new masked array implementation are *major* removals. We do plan those things because they have major benefits. Imho "major benefits" is a bar that needs to be passed before listing features as up for removal on a roadmap (even a draft one). It would be helpful maybe to find a form for the roadmap where the essentials of such discussions (key pros/cons) can be captured. Or at least split it in good/desirable/planned items and "wild ideas". Re `financial`, there isn't much of a pro as far as I can tell - there's almost zero maintenance cost now, and it doesn't hinder any of the proposed new features. Plus it's a discussion we've had a couple of times before. I know that the current roadmap doc is only draft, but it still says "NumPy Roadmap" and it's the best thing we have now, so I'd prefer to not have things there (or have them in a separate random/controversial ideas section) that are unlikely to happen or for which it's unclear if they're good ideas. Cheers, Ralf
I like the idea of a random/controversial ideas section. On Fri, Jun 1, 2018 at 12:11 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Jun 1, 2018 at 9:57 AM, Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Ralf,
On Thu, 31 May 2018 21:57:06 -0700, Ralf Gommers wrote:
- "internal refactorings": MaskedArray yes, but the other ones no. numpy.distutils and f2py are very hard to test, a big refactor pretty much guarantees breakage. there's also not much need for refactoring, because those things are not coupled to the numpy.core internals. numpy.financial is simply uninteresting - we wish it wasn't there but it is, so now it simply stays where it is.
I want to clarify that in the current notes we put down ideas that prompted active discussion, even if they weren't necessarily feasible. I feel it is important to keep the conversation open to run its course until we have a good understanding of the various issues at hand.
You may find that, in person, people are more willing to admit to their support for some "heretical" ideas than they are here on the list.
Thanks Stefan, good points. I totally agree that anything can be discussed.
E.g., you say that the financial functions "now simply stay", but that promises a future of a NumPy that never shrinks, while there is certainly some support for allowing NumPy to contract so that we can release maintenance burden and allow development of other core areas that have been neglected for a long time.
You will *always* have small, vocal proponents of any specific piece of functionality; that doesn't necessarily mean that such functionality contributes to the health of a project as a whole.
So, I gently urge us carefully reconsider the narrative that nothing can change/be removed, and evaluate each suggestion carefully, not weighing only the very evident negatives but also the longer term positives.
I don't think there's such a narrative - e.g. the removal of np.matrix that we've planned and getting rid of MaskedArray at some point once we have a better new masked array implementation are *major* removals. We do plan those things because they have major benefits. Imho "major benefits" is a bar that needs to be passed before listing features as up for removal on a roadmap (even a draft one).
It would be helpful maybe to find a form for the roadmap where the essentials of such discussions (key pros/cons) can be captured. Or at least split it in good/desirable/planned items and "wild ideas".
Re `financial`, there isn't much of a pro as far as I can tell - there's almost zero maintenance cost now, and it doesn't hinder any of the proposed new features. Plus it's a discussion we've had a couple of times before.
I know that the current roadmap doc is only draft, but it still says "NumPy Roadmap" and it's the best thing we have now, so I'd prefer to not have things there (or have them in a separate random/controversial ideas section) that are unlikely to happen or for which it's unclear if they're good ideas.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, May 31, 2018, 19:50 Matti Picus <matti.picus@gmail.com> wrote:
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts.
I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin.
Eventually it could become a NEP or formalized in another way.
Matti
Some things I have seen mentioned but don't know the current plans for: * Categorical arrays * Releasing the GIL wherever possible * Using multithreading internally * making use of the next generation blas when available and stay involved in planning to make sure it supports our needs * Figure out where to use Cython and were not to
On Fri, Jun 1, 2018, 11:27 Todd <toddrjen@gmail.com> wrote:
On Thu, May 31, 2018, 19:50 Matti Picus <matti.picus@gmail.com> wrote:
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts.
I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin.
Eventually it could become a NEP or formalized in another way.
Matti
Some things I have seen mentioned but don't know the current plans for:
* Categorical arrays * Releasing the GIL wherever possible * Using multithreading internally * making use of the next generation blas when available and stay involved in planning to make sure it supports our needs * Figure out where to use Cython and were not to
Also: * Figure out the best way to handle strings. This may involve multiple approaches for different situations but the current approach may not be the best default approach. * Decimal and/or rational arrays * if yes to labeled arrays, then there should probably be a pep about label-based indexing * A decision about how to handle numpy 2.0
On Thu, May 31, 2018 at 5:50 PM, Matti Picus <matti.picus@gmail.com> wrote:
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts.
I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin.
Eventually it could become a NEP or formalized in another way.
Matti
Under maintenance we could add something about the transition to Python 3, in particular cleaning up the code and updating the documentation examples. Chuck
Hi, Do you plan to consider trying to add PEP 574 / pickle5 support? There's an implementation ready (and a PyPI backport) that you can play with. https://www.python.org/dev/peps/pep-0574/ PEP 574 implicits targets Numpy arrays as one of its primary producers, since Numpy arrays is how large scientific or numerical data often ends up represented and where zero-copy is often desired by users. PEP 574 could certainly be useful even without Numpy arrays supporting it, but less so. So I would welcome any feedback on that front (and, given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd ideally like to have that feedback sometimes in the forthcoming months ;-)). Best regards Antoine. On Thu, 31 May 2018 16:50:02 -0700 Matti Picus <matti.picus@gmail.com> wrote:
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts.
I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin.
Eventually it could become a NEP or formalized in another way.
Matti
PEP-574 isn't on the roadmap (yet!), but I think we would clearly welcome it. Like all NumPy improvements, it would need to implemented by an interested party. On Mon, Jun 4, 2018 at 1:52 AM Antoine Pitrou <antoine@python.org> wrote:
Hi,
Do you plan to consider trying to add PEP 574 / pickle5 support? There's an implementation ready (and a PyPI backport) that you can play with. https://www.python.org/dev/peps/pep-0574/
PEP 574 implicits targets Numpy arrays as one of its primary producers, since Numpy arrays is how large scientific or numerical data often ends up represented and where zero-copy is often desired by users.
PEP 574 could certainly be useful even without Numpy arrays supporting it, but less so. So I would welcome any feedback on that front (and, given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd ideally like to have that feedback sometimes in the forthcoming months ;-)).
Best regards
Antoine.
On Thu, 31 May 2018 16:50:02 -0700 Matti Picus <matti.picus@gmail.com> wrote:
At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts.
I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin.
Eventually it could become a NEP or formalized in another way.
Matti
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (12)
-
Antoine Pitrou
-
Charles R Harris
-
Chris Barker
-
Gael Varoquaux
-
Jarrod Millman
-
Marten van Kerkwijk
-
Matthew Harrigan
-
Matti Picus
-
Ralf Gommers
-
Stefan van der Walt
-
Stephan Hoyer
-
Todd