[Numpy-discussion] An alternative to vectorize that lets you access the array?
sebastian at sipsolutions.net
Sun Jul 12 10:01:32 EDT 2020
On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
> Hi everyone,
> Here's a problem I've been dealing with. I wonder whether NumPy has a
> that will help me, or whether this could be a useful feature request.
> In the upcoming EuroPython 20200, I'll do a talk about live-coding a
> synthesizer. It's going to be a fun talk, I'll use the sounddevice
> <https://github.com/spatialaudio/python-sounddevice/> module to make
> program that plays music. Do attend, or watch it on YouTube when it's
> out :)
Sounds like a fun talk :).
> There's a part in my talk that I could make simpler, and thus shave
> minutes of cumbersome explanations. These 3-4 minutes matter a great
> to me. But for that I need to do something with NumPy and I don't
> whether it's possible or not.
> The sounddevice library takes an ndarray of sound data and plays it.
> Currently I use `vectorize` to produce that array:
> output_array = np.vectorize(f, otypes='d')(input_array)
> And I'd like to replace it with this code, which is supposed to give
> same output:
> output_array = np.ndarray(input_array.shape, dtype='d')
Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead.
`np.ndarray` works but is pretty low-level, and I would usually avoid
it for array creation.
> for i, item in enumerate(input_array):
> output_array[i] = f(item)
Ok, one hack that you can try, is to replace `item` with `item.item()`,
that will convert the NumPy scalar to a Python scalar, which is quite a
lot more lightweight and faster. Also it might give PyPy more chance
to optimize `f` I suppose.
> The reason I want the second version is that I can then have
> start playing `output_array` in a separate thread, while it's being
> calculated. (Yes, I know about the GIL, I believe that sounddevice
`np.vectorize` will definitely not release the GIL, this loop may in
between (I am not sure), but also adds quite a bit of overheads
compared to `vectorize`. The best thing of course would be if you can
rewrite `f` to accept an array?
> Unfortunately, the for loop is very slow, even when I'm not
> processing the
> data on separate thread. I benchmarked it on both CPython and PyPy3,
> is my target platform. On CPython it's 3 times slower than vectorize,
> on PyPy3 it's 67 times slower than vectorize! That's despite the fact
> the Numpy documentation says "The `vectorize` function is provided
> primarily for convenience, not for performance. The implementation is
> essentially a `for` loop."
PyPy is nice because it makes NumPy just work. Unfortunately, that also
adds some overheads, so at least some slowdown is probably expected. I
am not sure about why it is so much.
I would not be surprised if a list comprehension is not much faster,
especially on PyPy (assuming you cannot modify `f` to work with
> So here are a few questions:
> 1. Is there something like `vectorize`, except you get to access the
> array before it's finished? If not, what do you think about adding
> that as
> an option to `vectorize`?
vectorize should allow an `out=` argument to pass in the output array,
would that help you? So you can access it, but I am not sure how that
will help you. Although you could create a big result array and then
access chunks of it:
final_arr = np.empty(...)
newly_written = slice(0, 1000)
where newly_written is defined by the input chunk you got, I suppose.
> 2. Is there a more efficient way of writing the `for` loop I've
> above? Or any other kind of solution to my
As said, the main thing would be to modify `f` in whatever way
possible. For that it would be useful to know what `f` does exactly.
Maybe you can move `f` to Cython or numba, or maybe write in a way that
works on arrays...
> Thanks for your help,
> Ram Rachum.
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: This is a digitally signed message part
More information about the NumPy-Discussion