[Numpy-discussion] An alternative to vectorize that lets you access the array?

Andras Deak deak.andris at gmail.com
Sun Jul 12 09:42:57 EDT 2020


On Sun, Jul 12, 2020 at 3:02 PM Ram Rachum <ram at rachum.com> wrote:
>
> Hi everyone,
>
> Here's a problem I've been dealing with. I wonder whether NumPy has a tool that will help me, or whether this could be a useful feature request.
>
> In the upcoming EuroPython 20200, I'll do a talk about live-coding a music synthesizer. It's going to be a fun talk, I'll use the sounddevice module to make a program that plays music. Do attend, or watch it on YouTube when it's out :)
>
> There's a part in my talk that I could make simpler, and thus shave 3-4 minutes of cumbersome explanations. These 3-4 minutes matter a great deal to me. But for that I need to do something with NumPy and I don't know whether it's possible or not.
>
>
> The sounddevice library takes an ndarray of sound data and plays it. Currently I use `vectorize` to produce that array:
>
>     output_array = np.vectorize(f, otypes='d')(input_array)
>
> And I'd like to replace it with this code, which is supposed to give the same output:
>
>     output_array = np.ndarray(input_array.shape, dtype='d')
>     for i, item in enumerate(input_array):
>         output_array[i] = f(item)
>
> The reason I want the second version is that I can then have sounddevice start playing `output_array` in a separate thread, while it's being calculated. (Yes, I know about the GIL, I believe that sounddevice releases it.)
>
> Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize! That's despite the fact that the Numpy documentation says "The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a `for` loop."
>
> So here are a few questions:
>
> 1. Is there something like `vectorize`, except you get to access the output array before it's finished? If not, what do you think about adding that as an option to `vectorize`?
>
> 2. Is there a more efficient way of writing the `for` loop I've written above? Or any other kind of solution to my problem?

Hi Ram,

I find your description of the behaviour really surprising! My
experience with np.vectorize has been consistent with the
documentation's note. Can you please provide some more context?
  1. What shape is your array?
  2. How exactly did you compute the runtimes?
  3. How large runtimes are we talking? Are you sure you're not
measuring some kind of overhead?
  4. What kind of work does f do? This is mostly relevant for your
question about alternatives for your loop.

Unfortunately I don't believe it's possible or it would even _be_
possible to give access to half-done results of computations. As far
as I know even asynchronous libraries make you have to wait until some
result is done. So unless you chop up your array along the first
dimension and explicitly work with each slice independently, I'm
pretty sure this is not possible. Just imagine the wealth of possible
race conditions if you could have access to half-initialized arrays.

The only actionable suggestion I have for your loop is to replace the
`np.ndarray` call with one to `np.empty`. My impression has always
been that arrays should be instantiated with one of the helper
functions rather than directly from the type. Personally, I don't use
vectorize at all because I tend to find that it only misleads the
reader.
Regards,

András

>
> Thanks for your help,
> Ram Rachum.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list