An alternative to vectorize that lets you access the array?
Hi everyone, Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request. In the upcoming EuroPython 20200, I'll do a talk about live-coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice <https://github.com/spatialaudio/python-sounddevice/> module to make a program that plays music. Do attend, or watch it on YouTube when it's out :) There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not. The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array: output_array = np.vectorize(f, otypes='d')(input_array) And I'd like to replace it with this code, which is supposed to give the same output: output_array = np.ndarray(input_array.shape, dtype='d') for i, item in enumerate(input_array): output_array[i] = f(item) The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.) Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop." So here are a few questions: 1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`? 2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my problem? Thanks for your help, Ram Rachum.
On Sun, Jul 12, 2020 at 3:02 PM Ram Rachum <ram@rachum.com> wrote:
Hi everyone,
Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request.
In the upcoming EuroPython 20200, I'll do a talk about live-coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice module to make a program that plays music. Do attend, or watch it on YouTube when it's out :)
There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not.
The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array:
output_array = np.vectorize(f, otypes='d')(input_array)
And I'd like to replace it with this code, which is supposed to give the same output:
output_array = np.ndarray(input_array.shape, dtype='d') for i, item in enumerate(input_array): output_array[i] = f(item)
The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop."
So here are a few questions:
1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`?
2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my problem?
Hi Ram, I find your description of the behaviour really surprising! My experience with np.vectorize has been consistent with the documentation's note. Can you please provide some more context? 1. What shape is your array? 2. How exactly did you compute the runtimes? 3. How large runtimes are we talking? Are you sure you're not measuring some kind of overhead? 4. What kind of work does f do? This is mostly relevant for your question about alternatives for your loop. Unfortunately I don't believe it's possible or it would even _be_ possible to give access to half-done results of computations. As far as I know even asynchronous libraries make you have to wait until some result is done. So unless you chop up your array along the first dimension and explicitly work with each slice independently, I'm pretty sure this is not possible. Just imagine the wealth of possible race conditions if you could have access to half-initialized arrays. The only actionable suggestion I have for your loop is to replace the `np.ndarray` call with one to `np.empty`. My impression has always been that arrays should be instantiated with one of the helper functions rather than directly from the type. Personally, I don't use vectorize at all because I tend to find that it only misleads the reader. Regards, András
Thanks for your help, Ram Rachum. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
Hi everyone,
Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request.
In the upcoming EuroPython 20200, I'll do a talk about live-coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice <https://github.com/spatialaudio/python-sounddevice/> module to make a program that plays music. Do attend, or watch it on YouTube when it's out :)
Sounds like a fun talk :).
There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not.
The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array:
output_array = np.vectorize(f, otypes='d')(input_array)
And I'd like to replace it with this code, which is supposed to give the same output:
output_array = np.ndarray(input_array.shape, dtype='d')
Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead. `np.ndarray` works but is pretty low-level, and I would usually avoid it for array creation.
for i, item in enumerate(input_array): output_array[i] = f(item)
Ok, one hack that you can try, is to replace `item` with `item.item()`, that will convert the NumPy scalar to a Python scalar, which is quite a lot more lightweight and faster. Also it might give PyPy more chance to optimize `f` I suppose.
The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
`np.vectorize` will definitely not release the GIL, this loop may in between (I am not sure), but also adds quite a bit of overheads compared to `vectorize`. The best thing of course would be if you can rewrite `f` to accept an array?
Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop."
PyPy is nice because it makes NumPy just work. Unfortunately, that also adds some overheads, so at least some slowdown is probably expected. I am not sure about why it is so much. I would not be surprised if a list comprehension is not much faster, especially on PyPy (assuming you cannot modify `f` to work with arrays).
So here are a few questions:
1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`?
vectorize should allow an `out=` argument to pass in the output array, would that help you? So you can access it, but I am not sure how that will help you. Although you could create a big result array and then access chunks of it: final_arr = np.empty(...) newly_written = slice(0, 1000) run_calculation(final_arr[newly_written]) where newly_written is defined by the input chunk you got, I suppose.
2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my
As said, the main thing would be to modify `f` in whatever way possible. For that it would be useful to know what `f` does exactly. Maybe you can move `f` to Cython or numba, or maybe write in a way that works on arrays...
Thanks for your help, Ram Rachum. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Thank you Sebastian and Andras for your detailed replies. Sebastian, your suggestion of adding `item.item()` solved my problem! Now the for loop is still slower than vectorize, but by a smaller factor, and that's fast enough for my demonstration. My problem is solved and I'm very happy! I also tried your `out=` suggestion for vectorize, but I think you made a mistake, as it doesn't seem that it takes that argument. If I missed something and it does (maybe it's a very new feature?) that would be even better for me than the `.item()` solution. On Sun, Jul 12, 2020 at 5:03 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
Hi everyone,
Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request.
In the upcoming EuroPython 20200, I'll do a talk about live-coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice <https://github.com/spatialaudio/python-sounddevice/> module to make a program that plays music. Do attend, or watch it on YouTube when it's out :)
Sounds like a fun talk :).
There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not.
The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array:
output_array = np.vectorize(f, otypes='d')(input_array)
And I'd like to replace it with this code, which is supposed to give the same output:
output_array = np.ndarray(input_array.shape, dtype='d')
Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead. `np.ndarray` works but is pretty low-level, and I would usually avoid it for array creation.
for i, item in enumerate(input_array): output_array[i] = f(item)
Ok, one hack that you can try, is to replace `item` with `item.item()`, that will convert the NumPy scalar to a Python scalar, which is quite a lot more lightweight and faster. Also it might give PyPy more chance to optimize `f` I suppose.
The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
`np.vectorize` will definitely not release the GIL, this loop may in between (I am not sure), but also adds quite a bit of overheads compared to `vectorize`. The best thing of course would be if you can rewrite `f` to accept an array?
Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop."
PyPy is nice because it makes NumPy just work. Unfortunately, that also adds some overheads, so at least some slowdown is probably expected. I am not sure about why it is so much. I would not be surprised if a list comprehension is not much faster, especially on PyPy (assuming you cannot modify `f` to work with arrays).
So here are a few questions:
1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`?
vectorize should allow an `out=` argument to pass in the output array, would that help you? So you can access it, but I am not sure how that will help you. Although you could create a big result array and then access chunks of it:
final_arr = np.empty(...) newly_written = slice(0, 1000) run_calculation(final_arr[newly_written])
where newly_written is defined by the input chunk you got, I suppose.
2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my
As said, the main thing would be to modify `f` in whatever way possible. For that it would be useful to know what `f` does exactly. Maybe you can move `f` to Cython or numba, or maybe write in a way that works on arrays...
Thanks for your help, Ram Rachum. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Thank you Sebastian and Andras for your detailed replies. Sebastian, your suggestion of adding `item.item()` solved my problem! Now the for loop is still slower than vectorize, but by a smaller factor, and that's fast enough for my demonstration. My problem is solved and I'm very happy! I also tried your `out=` suggestion for vectorize, but I think you made a mistake, as it doesn't seem that it takes that argument. If I missed something and it does (maybe it's a very new feature?) that would be even better for me than the `.item()` solution.
On Sun, Jul 12, 2020 at 5:03 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
Hi everyone,
Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request.
In the upcoming EuroPython 20200, I'll do a talk about live-coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice <https://github.com/spatialaudio/python-sounddevice/> module to make a program that plays music. Do attend, or watch it on YouTube when it's out :)
Sounds like a fun talk :).
There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not.
The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array:
output_array = np.vectorize(f, otypes='d')(input_array)
And I'd like to replace it with this code, which is supposed to give the same output:
output_array = np.ndarray(input_array.shape, dtype='d')
Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead. `np.ndarray` works but is pretty low-level, and I would usually avoid it for array creation.
for i, item in enumerate(input_array): output_array[i] = f(item)
Ok, one hack that you can try, is to replace `item` with `item.item()`, that will convert the NumPy scalar to a Python scalar, which is quite a lot more lightweight and faster. Also it might give PyPy more chance to optimize `f` I suppose.
The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
`np.vectorize` will definitely not release the GIL, this loop may in between (I am not sure), but also adds quite a bit of overheads compared to `vectorize`. The best thing of course would be if you can rewrite `f` to accept an array?
Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop."
PyPy is nice because it makes NumPy just work. Unfortunately, that also adds some overheads, so at least some slowdown is probably expected. I am not sure about why it is so much. I would not be surprised if a list comprehension is not much faster, especially on PyPy (assuming you cannot modify `f` to work with arrays).
So here are a few questions:
1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`?
vectorize should allow an `out=` argument to pass in the output array, would that help you? So you can access it, but I am not sure how that will help you. Although you could create a big result array and then access chunks of it:
final_arr = np.empty(...) newly_written = slice(0, 1000) run_calculation(final_arr[newly_written])
where newly_written is defined by the input chunk you got, I suppose.
2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my
As said, the main thing would be to modify `f` in whatever way possible. For that it would be useful to know what `f` does exactly. Maybe you can move `f` to Cython or numba, or maybe write in a way that works on arrays...
Thanks for your help, Ram Rachum. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, 2020-07-13 at 15:45 +0300, Ram Rachum wrote:
Thank you Sebastian and Andras for your detailed replies.
Sebastian, your suggestion of adding `item.item()` solved my problem! Now the for loop is still slower than vectorize, but by a smaller factor, and that's fast enough for my demonstration. My problem is solved and I'm very happy!
I also tried your `out=` suggestion for vectorize, but I think you made a mistake, as it doesn't seem that it takes that argument. If I missed something and it does (maybe it's a very new feature?) that would be even better for me than the `.item()` solution.
You are right, I thought vectorize may be a proper ufunc internally in this branch (like frompyfunc), but `frompyfunc` currently does not support dtypes other than object (which could be a nice improvement to make vectorize more replaceable). - Sebastian
On Sun, Jul 12, 2020 at 5:03 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
Hi everyone,
Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request.
In the upcoming EuroPython 20200, I'll do a talk about live- coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice <https://github.com/spatialaudio/python-sounddevice/> module to make a program that plays music. Do attend, or watch it on YouTube when it's out :)
Sounds like a fun talk :).
There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not.
The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array:
output_array = np.vectorize(f, otypes='d')(input_array)
And I'd like to replace it with this code, which is supposed to give the same output:
output_array = np.ndarray(input_array.shape, dtype='d')
Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead. `np.ndarray` works but is pretty low-level, and I would usually avoid it for array creation.
for i, item in enumerate(input_array): output_array[i] = f(item)
Ok, one hack that you can try, is to replace `item` with `item.item()`, that will convert the NumPy scalar to a Python scalar, which is quite a lot more lightweight and faster. Also it might give PyPy more chance to optimize `f` I suppose.
The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
`np.vectorize` will definitely not release the GIL, this loop may in between (I am not sure), but also adds quite a bit of overheads compared to `vectorize`. The best thing of course would be if you can rewrite `f` to accept an array?
Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop."
PyPy is nice because it makes NumPy just work. Unfortunately, that also adds some overheads, so at least some slowdown is probably expected. I am not sure about why it is so much. I would not be surprised if a list comprehension is not much faster, especially on PyPy (assuming you cannot modify `f` to work with arrays).
So here are a few questions:
1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`?
vectorize should allow an `out=` argument to pass in the output array, would that help you? So you can access it, but I am not sure how that will help you. Although you could create a big result array and then access chunks of it:
final_arr = np.empty(...) newly_written = slice(0, 1000) run_calculation(final_arr[newly_written])
where newly_written is defined by the input chunk you got, I suppose.
2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my
As said, the main thing would be to modify `f` in whatever way possible. For that it would be useful to know what `f` does exactly. Maybe you can move `f` to Cython or numba, or maybe write in a way that works on arrays...
Thanks for your help, Ram Rachum. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 12/07/2020 07:00, Ram Rachum wrote:
The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
I don't think this is a sound design. I don't know sounddevice, but in similar situations the standard pattern is to allocate a buffer (in this case it can be a numpy array) and pass that to the consumer (soundevice in your case). The consumer then tells the producer (your music synth) when it has to produce more data. At a quick read, it seems that the sounddevice.Stream class allows to apply this pattern https://python-sounddevice.readthedocs.io/en/0.3.15/usage.html#callback-stre... This also easily allows your produces function to operate on arrays and not on single elements. Using numpy functions to operate on arrays is going to be more efficient than iterating on the elements in Python. Cheers, Dan
participants (4)
-
Andras Deak
-
Daniele Nicolodi
-
Ram Rachum
-
Sebastian Berg