[Numpy-discussion] An alternative to vectorize that lets you access the array?

Sebastian Berg sebastian at sipsolutions.net
Mon Jul 13 10:52:40 EDT 2020


On Mon, 2020-07-13 at 15:45 +0300, Ram Rachum wrote:
> Thank you Sebastian and Andras for your detailed replies.
> 
> Sebastian, your suggestion of adding `item.item()` solved my problem!
> Now
> the for loop is still slower than vectorize, but by a smaller factor,
> and
> that's fast enough for my demonstration. My problem is solved and I'm
> very
> happy!
> 
> I also tried your `out=` suggestion for vectorize, but I think you
> made a
> mistake, as it doesn't seem that it takes that argument. If I missed
> something and it does (maybe it's a very new feature?) that would be
> even
> better for me than the `.item()` solution.
> 

You are right, I thought vectorize may be a proper ufunc internally in
this branch (like frompyfunc), but `frompyfunc` currently does not
support dtypes other than object (which could be a nice improvement to
make vectorize more replaceable).

- Sebastian


> 
> > On Sun, Jul 12, 2020 at 5:03 PM Sebastian Berg <
> > sebastian at sipsolutions.net>
> > wrote:
> > 
> > > On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
> > > > Hi everyone,
> > > > 
> > > > Here's a problem I've been dealing with. I wonder whether NumPy
> > > > has a
> > > > tool
> > > > that will help me, or whether this could be a useful feature
> > > > request.
> > > > 
> > > > In the upcoming EuroPython 20200, I'll do a talk about live-
> > > > coding a
> > > > music
> > > > synthesizer. It's going to be a fun talk, I'll use the
> > > > sounddevice
> > > > <https://github.com/spatialaudio/python-sounddevice/> module to
> > > > make
> > > > a
> > > > program that plays music. Do attend, or watch it on YouTube
> > > > when it's
> > > > out :)
> > > > 
> > > 
> > > Sounds like a fun talk :).
> > > 
> > > > There's a part in my talk that I could make simpler, and thus
> > > > shave
> > > > 3-4
> > > > minutes of cumbersome explanations. These 3-4 minutes matter a
> > > > great
> > > > deal
> > > > to me. But for that I need to do something with NumPy and I
> > > > don't
> > > > know
> > > > whether it's possible or not.
> > > > 
> > > > 
> > > > The sounddevice library takes an ndarray of sound data and
> > > > plays it.
> > > > Currently I use `vectorize` to produce that array:
> > > > 
> > > >     output_array = np.vectorize(f, otypes='d')(input_array)
> > > > 
> > > > And I'd like to replace it with this code, which is supposed to
> > > > give
> > > > the
> > > > same output:
> > > > 
> > > >     output_array = np.ndarray(input_array.shape, dtype='d')
> > > 
> > > Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead.
> > > `np.ndarray` works but is pretty low-level, and I would usually
> > > avoid
> > > it for array creation.
> > > 
> > > >     for i, item in enumerate(input_array):
> > > >         output_array[i] = f(item)
> > > > 
> > > 
> > > Ok, one hack that you can try, is to replace `item` with
> > > `item.item()`,
> > > that will convert the NumPy scalar to a Python scalar, which is
> > > quite a
> > > lot more lightweight and faster.  Also it might give PyPy more
> > > chance
> > > to optimize `f` I suppose.
> > > 
> > > 
> > > > The reason I want the second version is that I can then have
> > > > sounddevice
> > > > start playing `output_array` in a separate thread, while it's
> > > > being
> > > > calculated. (Yes, I know about the GIL, I believe that
> > > > sounddevice
> > > > releases
> > > > it.)
> > > 
> > > `np.vectorize` will definitely not release the GIL, this loop may
> > > in
> > > between (I am not sure), but also adds quite a bit of overheads
> > > compared to `vectorize`.  The best thing of course would be if
> > > you can
> > > rewrite `f` to accept an array?
> > > 
> > > 
> > > > Unfortunately, the for loop is very slow, even when I'm not
> > > > processing the
> > > > data on separate thread. I benchmarked it on both CPython and
> > > > PyPy3,
> > > > which
> > > > is my target platform. On CPython it's 3 times slower than
> > > > vectorize,
> > > > and
> > > > on PyPy3 it's 67 times slower than vectorize! That's despite
> > > > the fact
> > > > that
> > > > the Numpy documentation says "The `vectorize` function is
> > > > provided
> > > > primarily for convenience, not for performance. The
> > > > implementation is
> > > > essentially a `for` loop."
> > > 
> > > PyPy is nice because it makes NumPy just work. Unfortunately,
> > > that also
> > > adds some overheads, so at least some slowdown is probably
> > > expected.  I
> > > am not sure about why it is so much.
> > > I would not be surprised if a list comprehension is not much
> > > faster,
> > > especially on PyPy (assuming you cannot modify `f` to work with
> > > arrays).
> > > 
> > > > So here are a few questions:
> > > > 
> > > > 1. Is there something like `vectorize`, except you get to
> > > > access the
> > > > output
> > > > array before it's finished? If not, what do you think about
> > > > adding
> > > > that as
> > > > an option to `vectorize`?
> > > 
> > > vectorize should allow an `out=` argument to pass in the output
> > > array,
> > > would that help you?  So you can access it, but I am not sure how
> > > that
> > > will help you.  Although you could create a big result array and
> > > then
> > > access chunks of it:
> > > 
> > >    final_arr = np.empty(...)
> > >    newly_written = slice(0, 1000)
> > >    run_calculation(final_arr[newly_written])
> > > 
> > > where newly_written is defined by the input chunk you got, I
> > > suppose.
> > > 
> > > 
> > > > 2. Is there a more efficient way of writing the `for` loop I've
> > > > written
> > > > above? Or any other kind of solution to my
> > > 
> > > As said, the main thing would be to modify `f` in whatever way
> > > possible.  For that it would be useful to know what `f` does
> > > exactly.
> > > Maybe you can move `f` to Cython or numba, or maybe write in a
> > > way that
> > > works on arrays...
> > > 
> > > > Thanks for your help,
> > > > Ram Rachum.
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200713/4e3c65f6/attachment-0001.sig>


More information about the NumPy-Discussion mailing list