ENH: Introducing a pipe Method for Numpy arrays
Hello Numpy community, I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance their usability and expressiveness. Similar to other data processing libraries like pandas, a `pipe` method would allow users to chain operations together in a more readable and intuitive manner. Consider the following examples where method chaining with `pipe` can improve code readability compared to traditional NumPy code: # ******************************************************************** # Class PipeableArray just for illustration import numpy as np class PipeableArray: def __init__(self, array: np.ndarray): self.array = array def pipe(self, func, *args, **kwargs): """Apply function and return the result wrapped in PipeableArray.""" try: result = func(self.array, *args, **kwargs) return PipeableArray(result) except Exception as exc: print('Ups, something went wrong...') def __repr__(self): return repr(self.array) # ******************************************************************** # Original code using traditional NumPy chaining arr = np.array([1, 2, 3, 4, 5]) arr = np.square(arr) arr = np.log(arr) arr = np.cumsum(arr) # Original code using traditional NumPy nested functions arr = np.arange(1., 5.) result = np.cumsum(np.log(np.square(arr))) # ******************************************************************** # Proposed Numpy method chaining using a new pipe method arr = PipeableArray(np.arange(1., 5.)) result = (arr .pipe(np.square) .pipe(np.log) .pipe(np.cumsum) ) # ******************************************************************** Benefits: - Readability: Method chaining with pipe offers a more readable and intuitive way to express complex data transformations, making the intended data processing pipeline easier to understand. - Customization: The pipe method allows users to chain custom functions or already implemented NumPy operations seamlessly. - Modularity: Users can define reusable functions and chain them together using pipe, leading to cleaner and more maintainable code. - Consistency: Introducing a pipe method in NumPy aligns with similar functionality available in other libraries like pandas, polars, etc. - Optimization: While NumPy may not currently optimize chained expressions, the introduction of pipe lays the groundwork for potential future optimizations with lazy evaluation. I believe this enhancement could benefit the NumPy community by providing a more flexible and expressive way to work with arrays. I'd love to see such a feature in Numpy and like to hear your thoughts on this proposal. Best regards, Oyibo
This idea looks interesting, but I think that having a pipeline method like `Sequential in PyTorch` would be more intuitive than this approach. On Thu, Feb 15, 2024, 8:48 PM <d.lenard80@gmail.com> wrote:
Hello Numpy community,
I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance their usability and expressiveness. Similar to other data processing libraries like pandas, a `pipe` method would allow users to chain operations together in a more readable and intuitive manner. Consider the following examples where method chaining with `pipe` can improve code readability compared to traditional NumPy code:
# ******************************************************************** # Class PipeableArray just for illustration
import numpy as np
class PipeableArray: def __init__(self, array: np.ndarray): self.array = array
def pipe(self, func, *args, **kwargs): """Apply function and return the result wrapped in PipeableArray.""" try: result = func(self.array, *args, **kwargs) return PipeableArray(result) except Exception as exc: print('Ups, something went wrong...')
def __repr__(self): return repr(self.array)
# ******************************************************************** # Original code using traditional NumPy chaining arr = np.array([1, 2, 3, 4, 5]) arr = np.square(arr) arr = np.log(arr) arr = np.cumsum(arr)
# Original code using traditional NumPy nested functions arr = np.arange(1., 5.) result = np.cumsum(np.log(np.square(arr)))
# ******************************************************************** # Proposed Numpy method chaining using a new pipe method
arr = PipeableArray(np.arange(1., 5.)) result = (arr .pipe(np.square) .pipe(np.log) .pipe(np.cumsum) ) # ********************************************************************
Benefits: - Readability: Method chaining with pipe offers a more readable and intuitive way to express complex data transformations, making the intended data processing pipeline easier to understand. - Customization: The pipe method allows users to chain custom functions or already implemented NumPy operations seamlessly. - Modularity: Users can define reusable functions and chain them together using pipe, leading to cleaner and more maintainable code. - Consistency: Introducing a pipe method in NumPy aligns with similar functionality available in other libraries like pandas, polars, etc. - Optimization: While NumPy may not currently optimize chained expressions, the introduction of pipe lays the groundwork for potential future optimizations with lazy evaluation.
I believe this enhancement could benefit the NumPy community by providing a more flexible and expressive way to work with arrays. I'd love to see such a feature in Numpy and like to hear your thoughts on this proposal.
Best regards, Oyibo _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rakshitsingh421@gmail.com
On Thu, Feb 15, 2024 at 10:21 AM <d.lenard80@gmail.com> wrote:
Hello Numpy community,
I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance their usability and expressiveness.
Adding a prominent method like this to `np.ndarray` is something that we will probably not take up ourselves unless it is adopted by the Array API standard <https://data-apis.org/array-api/latest/>. It's possible that you might get some interest there since the Array API deliberately strips out the number of methods that we already have (e.g. `.mean()`, `.sum()`, etc.) in favor of functions. A general way to add some kind of fluency cheaply in an Array API-agnostic fashion might be helpful to people trying to make their numpy-only code that uses our current set of methods in this way a bit easier. But you'll have to make the proposal to them, I think, to get started. -- Robert Kern
Hi Oyibo,
I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance their usability and expressiveness.
I think it is an interesting idea, but agree with Robert that it is unlikely to fly on its own. Part of the logic of even frowning on methods like .mean() and .sum() is that ndarray is really a data container, and should have methods related to that, as much as possible independent of the meaning of those data (which is given by the dtype). A bit more generally, your example is nice, but a pipe can have just one input, while of course many operations require two or more.
- Optimization: While NumPy may not currently optimize chained expressions, the introduction of pipe lays the groundwork for potential future optimizations with lazy evaluation.
Optimization might indeed be made possible, though I would think that for that one may be better off with something like dask. That said, I've been playing with the ability to chain ufuncs to optimize their execution, by applying the ufuncs in series on small pieces of larger arrays, thus avoiding large temporaries (a bit like numexpr but with the idea of defining a fast function rather than giving an expression as a string); see https://github.com/mhvk/chain_ufunc All the best, Marten
What were your conclusions after experimenting with chained ufuncs? If the speed is comparable to numexpr, wouldn’t it be `nicer` to have non-string input format? It would feel a bit less like a black-box. Regards, DG
On 15 Feb 2024, at 22:52, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Hi Oyibo,
I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance their usability and expressiveness.
I think it is an interesting idea, but agree with Robert that it is unlikely to fly on its own. Part of the logic of even frowning on methods like .mean() and .sum() is that ndarray is really a data container, and should have methods related to that, as much as possible independent of the meaning of those data (which is given by the dtype).
A bit more generally, your example is nice, but a pipe can have just one input, while of course many operations require two or more.
- Optimization: While NumPy may not currently optimize chained expressions, the introduction of pipe lays the groundwork for potential future optimizations with lazy evaluation.
Optimization might indeed be made possible, though I would think that for that one may be better off with something like dask.
That said, I've been playing with the ability to chain ufuncs to optimize their execution, by applying the ufuncs in series on small pieces of larger arrays, thus avoiding large temporaries (a bit like numexpr but with the idea of defining a fast function rather than giving an expression as a string); see https://github.com/mhvk/chain_ufunc
All the best,
Marten
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
What were your conclusions after experimenting with chained ufuncs?
If the speed is comparable to numexpr, wouldn’t it be `nicer` to have non-string input format?
It would feel a bit less like a black-box.
I haven't gotten further than it yet, it is just some toying around I've been doing. But I'd indeed prefer not to go via strings -- possibly numexpr could use a similar mechanism to what I did to construct the function that is being evaluated. Aside: your suggestion of the pipe led to some further discussion at https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581 -- as a more general way of passing arrays to functions. -- Marten
Just to clarify, I am not the one who suggested pipes. :) Read the issue. My 2 cents: From my experience, calling methods is generally faster than functions. I figure it is due to having less overhead figuring out the input. Maybe it is not significant for large data, but it does make a difference even when working for medium sized arrays - say float size 5000. %timeit a.sum() 3.17 µs %timeit np.sum(a) 5.18 µs (In my experience, `sum` for medium size arrays often becomes a bottleneck in greedy optimisation algorithms where distances are calculated over and over for partial space.) In short, all I want to say is that it would be great if such if speed considerations were addressed if/when developing piping or anything similar. E.g. Pipe implementation could allow additions of optimisations. Then numexpr could then make a plugin. At the top user writes: np.pipe_use_plugin(numexpr.plug_pipe) # or something similar Then, numexpr would kick-in whenever appropriate when using pipes. Regards, DG
On 16 Feb 2024, at 00:12, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
What were your conclusions after experimenting with chained ufuncs?
If the speed is comparable to numexpr, wouldn’t it be `nicer` to have non-string input format?
It would feel a bit less like a black-box.
I haven't gotten further than it yet, it is just some toying around I've been doing. But I'd indeed prefer not to go via strings -- possibly numexpr could use a similar mechanism to what I did to construct the function that is being evaluated.
Aside: your suggestion of the pipe led to some further discussion at https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581 -- as a more general way of passing arrays to functions.
-- Marten _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
From my experience, calling methods is generally faster than functions. I figure it is due to having less overhead figuring out the input. Maybe it is not significant for large data, but it does make a difference even when working for medium sized arrays - say float size 5000.
%timeit a.sum() 3.17 µs %timeit np.sum(a) 5.18 µs
It is more that np.sum checks if there is a .sum() method and if so calls that. And then `ndarray.sum()` calls `np.add.reduce(array)`. In [2]: a = np.arange(5000.) In [3]: %timeit np.sum(a) 3.89 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [4]: %timeit a.sum() 2.43 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [5]: %timeit np.add.reduce(a) 2.33 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) Though I must admit I'm a bit surprised the excess is *that* large for using np.sum... There may be a little micro-optimization to be found... -- Marten
Thanks for this, every little helps. One more thing to mention on this topic. From a certain size dot product becomes faster than sum (due to parallelisation I guess?). E.g. def dotsum(arr): a = arr.reshape(1000, 100) return a.dot(np.ones(100)).sum() a = np.ones(100000) In [45]: %timeit np.add.reduce(a, axis=None) 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [43]: %timeit dotsum(a) 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) But theoretically, sum, should be faster than dot product by a fair bit. Isn’t parallelisation implemented for it? Regards, DG
On 16 Feb 2024, at 01:37, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
It is more that np.sum checks if there is a .sum() method and if so calls that. And then `ndarray.sum()` calls `np.add.reduce(array)`.
One more thing to mention on this topic.
From a certain size dot product becomes faster than sum (due to parallelisation I guess?).
E.g. def dotsum(arr): a = arr.reshape(1000, 100) return a.dot(np.ones(100)).sum()
a = np.ones(100000)
In [45]: %timeit np.add.reduce(a, axis=None) 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [43]: %timeit dotsum(a) 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
But theoretically, sum, should be faster than dot product by a fair bit.
Isn’t parallelisation implemented for it?
I cannot reproduce that: In [3]: %timeit np.add.reduce(a, axis=None) 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [4]: %timeit dotsum(a) 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) But almost certainly it is indeed due to optimizations, since .dot uses BLAS which is highly optimized (at least on some platforms, clearly better on yours than on mine!). I thought .sum() was optimized too, but perhaps less so? It may be good to raise a quick issue about this! Thanks, Marten
On 16 Feb 2024, at 2:48 am, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
In [45]: %timeit np.add.reduce(a, axis=None) 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [43]: %timeit dotsum(a) 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
But theoretically, sum, should be faster than dot product by a fair bit.
Isn’t parallelisation implemented for it?
I cannot reproduce that:
In [3]: %timeit np.add.reduce(a, axis=None) 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit dotsum(a) 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
But almost certainly it is indeed due to optimizations, since .dot uses BLAS which is highly optimized (at least on some platforms, clearly better on yours than on mine!).
I thought .sum() was optimized too, but perhaps less so?
I can confirm at least it does not seem to use multithreading – with the conda-installed numpy+BLAS I almost exactly reproduce your numbers, whereas linked against my own OpenBLAS build In [3]: %timeit np.add.reduce(a, axis=None) 19 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) # OMP_NUM_THREADS=1 In [4]: %timeit dots(a) 20.5 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) # OMP_NUM_THREADS=8 In [4]: %timeit dots(a) 9.84 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each) add.reduce shows no difference between the two and always remains at <= 100 % CPU usage. dotsum is scaling still better with larger matrices, e.g. ~4 x for 1000x1000. Cheers, Derek
Good to know it is not only on my PC. I have done a fair bit of work trying to find more efficient sum. The only faster option that I have found was PyTorch. (although thinking about it now maybe it’s because it was using MKL, don’t remember) MKL is faster, but I use OpenBLAS. Scipp library is parallelized, and its performance becomes similar to `dotsum` for large arrays, but it is slower than numpy or dotsum for size less than (somewhere towards) ~200k. Apart from these I ran out of options and simply implemented my own sum, where it uses either `np.sum` or `dotsum` depending on which is faster. This is the chart, where it can be seen the point where dotsum becomes faster than np.sum. https://gcdnb.pbrd.co/images/j8n3EsRz5g5v.png?o=1 <https://gcdnb.pbrd.co/images/j8n3EsRz5g5v.png?o=1> I am not sure how much (and for how many people) the improvement is needed / essential, but I have found several stack posts regarding this when I was looking into this. It is definitely to me though. Theoretically, if implemented with same optimisations, sum should be ~2x faster than dotsum. Or am I missing something? Regards, DG
On 16 Feb 2024, at 04:54, Homeier, Derek <dhomeie@gwdg.de> wrote:
On 16 Feb 2024, at 2:48 am, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
In [45]: %timeit np.add.reduce(a, axis=None) 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [43]: %timeit dotsum(a) 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
But theoretically, sum, should be faster than dot product by a fair bit.
Isn’t parallelisation implemented for it?
I cannot reproduce that:
In [3]: %timeit np.add.reduce(a, axis=None) 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit dotsum(a) 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
But almost certainly it is indeed due to optimizations, since .dot uses BLAS which is highly optimized (at least on some platforms, clearly better on yours than on mine!).
I thought .sum() was optimized too, but perhaps less so?
I can confirm at least it does not seem to use multithreading – with the conda-installed numpy+BLAS I almost exactly reproduce your numbers, whereas linked against my own OpenBLAS build
In [3]: %timeit np.add.reduce(a, axis=None) 19 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# OMP_NUM_THREADS=1 In [4]: %timeit dots(a) 20.5 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# OMP_NUM_THREADS=8 In [4]: %timeit dots(a) 9.84 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
add.reduce shows no difference between the two and always remains at <= 100 % CPU usage. dotsum is scaling still better with larger matrices, e.g. ~4 x for 1000x1000.
Cheers, Derek _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
On Fri, Feb 16, 2024 at 12:40 AM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
From my experience, calling methods is generally faster than functions. I figure it is due to having less overhead figuring out the input. Maybe it is not significant for large data, but it does make a difference even when working for medium sized arrays - say float size 5000.
%timeit a.sum() 3.17 µs %timeit np.sum(a) 5.18 µs
It is more that np.sum checks if there is a .sum() method and if so calls that. And then `ndarray.sum()` calls `np.add.reduce(array)`.
Also note that np.sum does a bunch of work *in pure Python*. Some of that Python code is really bad too, using `_wrapreduction` which has weird semantics (trying `getattr(x, 'sum')` for any object) that we could/should remove and that currently make the function even slower. The large gap in performance has little to do with functions vs. methods, more like the method being implemented in C and not having to defer to the function, rather than the other way around. Cheers, Ralf
In [2]: a = np.arange(5000.)
In [3]: %timeit np.sum(a) 3.89 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit a.sum() 2.43 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [5]: %timeit np.add.reduce(a) 2.33 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Though I must admit I'm a bit surprised the excess is *that* large for using np.sum... There may be a little micro-optimization to be found...
-- Marten _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ralf.gommers@gmail.com
Hi all, in PyTorch they (kind of) recently introduced torch.compile: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html In TensorFlow, eager execution needs to be activated manually, otherwise it creates a graph object which then acts like this kind of pipe. Don‘t know whether that‘s useful info for an implementation in Numpy. I‘m just referring to what I think may be similar to pipes in other Numpy-like frameworks. Best, Michael
On 15. Feb 2024, at 22:13, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
What were your conclusions after experimenting with chained ufuncs?
If the speed is comparable to numexpr, wouldn’t it be `nicer` to have non-string input format?
It would feel a bit less like a black-box.
I haven't gotten further than it yet, it is just some toying around I've been doing. But I'd indeed prefer not to go via strings -- possibly numexpr could use a similar mechanism to what I did to construct the function that is being evaluated.
Aside: your suggestion of the pipe led to some further discussion at https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581 -- as a more general way of passing arrays to functions.
-- Marten _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: michael.siebert2k@gmail.com
We could expand this topic for a broader perspective. Pandas offers "custom accessors," empowering users to extend DataFrame functionality, while Polars introduces "Expression plugins" for customization, enhancing DataFrame operations. These features are pretty awesome. The obvious advantage, the users are writing and maintaining additional methods. https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.register_... https://docs.pola.rs/user-guide/expressions/plugins/ For NumPy arrays, integrating similar functionalities, such as a pipe function for method chaining and "custom accessors" for increased flexibility, would improve the user experience. These features would not only encourage cleaner, reusable, and more expressive code but also align NumPy with other data processing libraries. Furthermore, enabling method chained pipelines to leverage acceleration techniques like JIT compilation at a later stage would further optimize performance. Implementing a pipe method could serve as an excellent starting point for these enhancements since it is the least effort. "Custom accessors" and leveraging acceleration techniques might be more ambitious.
participants (9)
-
d.lenard80@gmail.com
-
Dom Grigonis
-
Homeier, Derek
-
Marten van Kerkwijk
-
Michael Siebert
-
Oyibo
-
Rakshit Singh
-
Ralf Gommers
-
Robert Kern