[Numpy-discussion] An alternative to vectorize that lets you access the array?

Ram Rachum ram at rachum.com
Mon Jul 13 08:45:00 EDT 2020


Thank you Sebastian and Andras for your detailed replies.

Sebastian, your suggestion of adding `item.item()` solved my problem! Now
the for loop is still slower than vectorize, but by a smaller factor, and
that's fast enough for my demonstration. My problem is solved and I'm very
happy!

I also tried your `out=` suggestion for vectorize, but I think you made a
mistake, as it doesn't seem that it takes that argument. If I missed
something and it does (maybe it's a very new feature?) that would be even
better for me than the `.item()` solution.


> On Sun, Jul 12, 2020 at 5:03 PM Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
>> On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
>> > Hi everyone,
>> >
>> > Here's a problem I've been dealing with. I wonder whether NumPy has a
>> > tool
>> > that will help me, or whether this could be a useful feature request.
>> >
>> > In the upcoming EuroPython 20200, I'll do a talk about live-coding a
>> > music
>> > synthesizer. It's going to be a fun talk, I'll use the sounddevice
>> > <https://github.com/spatialaudio/python-sounddevice/> module to make
>> > a
>> > program that plays music. Do attend, or watch it on YouTube when it's
>> > out :)
>> >
>>
>> Sounds like a fun talk :).
>>
>> > There's a part in my talk that I could make simpler, and thus shave
>> > 3-4
>> > minutes of cumbersome explanations. These 3-4 minutes matter a great
>> > deal
>> > to me. But for that I need to do something with NumPy and I don't
>> > know
>> > whether it's possible or not.
>> >
>> >
>> > The sounddevice library takes an ndarray of sound data and plays it.
>> > Currently I use `vectorize` to produce that array:
>> >
>> >     output_array = np.vectorize(f, otypes='d')(input_array)
>> >
>> > And I'd like to replace it with this code, which is supposed to give
>> > the
>> > same output:
>> >
>> >     output_array = np.ndarray(input_array.shape, dtype='d')
>>
>> Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead.
>> `np.ndarray` works but is pretty low-level, and I would usually avoid
>> it for array creation.
>>
>> >     for i, item in enumerate(input_array):
>> >         output_array[i] = f(item)
>> >
>>
>> Ok, one hack that you can try, is to replace `item` with `item.item()`,
>> that will convert the NumPy scalar to a Python scalar, which is quite a
>> lot more lightweight and faster.  Also it might give PyPy more chance
>> to optimize `f` I suppose.
>>
>>
>> > The reason I want the second version is that I can then have
>> > sounddevice
>> > start playing `output_array` in a separate thread, while it's being
>> > calculated. (Yes, I know about the GIL, I believe that sounddevice
>> > releases
>> > it.)
>>
>> `np.vectorize` will definitely not release the GIL, this loop may in
>> between (I am not sure), but also adds quite a bit of overheads
>> compared to `vectorize`.  The best thing of course would be if you can
>> rewrite `f` to accept an array?
>>
>>
>> > Unfortunately, the for loop is very slow, even when I'm not
>> > processing the
>> > data on separate thread. I benchmarked it on both CPython and PyPy3,
>> > which
>> > is my target platform. On CPython it's 3 times slower than vectorize,
>> > and
>> > on PyPy3 it's 67 times slower than vectorize! That's despite the fact
>> > that
>> > the Numpy documentation says "The `vectorize` function is provided
>> > primarily for convenience, not for performance. The implementation is
>> > essentially a `for` loop."
>>
>> PyPy is nice because it makes NumPy just work. Unfortunately, that also
>> adds some overheads, so at least some slowdown is probably expected.  I
>> am not sure about why it is so much.
>> I would not be surprised if a list comprehension is not much faster,
>> especially on PyPy (assuming you cannot modify `f` to work with
>> arrays).
>>
>> > So here are a few questions:
>> >
>> > 1. Is there something like `vectorize`, except you get to access the
>> > output
>> > array before it's finished? If not, what do you think about adding
>> > that as
>> > an option to `vectorize`?
>>
>> vectorize should allow an `out=` argument to pass in the output array,
>> would that help you?  So you can access it, but I am not sure how that
>> will help you.  Although you could create a big result array and then
>> access chunks of it:
>>
>>    final_arr = np.empty(...)
>>    newly_written = slice(0, 1000)
>>    run_calculation(final_arr[newly_written])
>>
>> where newly_written is defined by the input chunk you got, I suppose.
>>
>>
>> >
>> > 2. Is there a more efficient way of writing the `for` loop I've
>> > written
>> > above? Or any other kind of solution to my
>>
>> As said, the main thing would be to modify `f` in whatever way
>> possible.  For that it would be useful to know what `f` does exactly.
>> Maybe you can move `f` to Cython or numba, or maybe write in a way that
>> works on arrays...
>>
>> >
>> > Thanks for your help,
>> > Ram Rachum.
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200713/912187b2/attachment.html>


More information about the NumPy-Discussion mailing list