Re: [Numpy-discussion] An alternative to vectorize that lets you access the array?

July 13, 2020


      On Mon, 2020-07-13 at 15:45 +0300, Ram Rachum wrote:
...
Thank you Sebastian and Andras for your detailed replies.
Sebastian, your suggestion of adding `item.item()` solved my problem!
Now
the for loop is still slower than vectorize, but by a smaller factor,
and
that's fast enough for my demonstration. My problem is solved and I'm
very
happy!
I also tried your `out=` suggestion for vectorize, but I think you
made a
mistake, as it doesn't seem that it takes that argument. If I missed
something and it does (maybe it's a very new feature?) that would be
even
better for me than the `.item()` solution.
You are right, I thought vectorize may be a proper ufunc internally in
this branch (like frompyfunc), but `frompyfunc` currently does not
support dtypes other than object (which could be a nice improvement to
make vectorize more replaceable).

- Sebastian
...
...
On Sun, Jul 12, 2020 at 5:03 PM Sebastian Berg <
sebastian@sipsolutions.net>
wrote:
...
On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
...
Hi everyone,
Here's a problem I've been dealing with. I wonder whether NumPy
has a
tool
that will help me, or whether this could be a useful feature
request.
In the upcoming EuroPython 20200, I'll do a talk about live-
coding a
music
synthesizer. It's going to be a fun talk, I'll use the
sounddevice
<https://github.com/spatialaudio/python-sounddevice/> module to
make
a
program that plays music. Do attend, or watch it on YouTube
when it's
out :)
Sounds like a fun talk :).
...
There's a part in my talk that I could make simpler, and thus
shave
3-4
minutes of cumbersome explanations. These 3-4 minutes matter a
great
deal
to me. But for that I need to do something with NumPy and I
don't
know
whether it's possible or not.
The sounddevice library takes an ndarray of sound data and
plays it.
Currently I use `vectorize` to produce that array:
output_array = np.vectorize(f, otypes='d')(input_array)
And I'd like to replace it with this code, which is supposed to
give
the
same output:
output_array = np.ndarray(input_array.shape, dtype='d')
Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead.
`np.ndarray` works but is pretty low-level, and I would usually
avoid
it for array creation.
...
for i, item in enumerate(input_array):
        output_array[i] = f(item)
Ok, one hack that you can try, is to replace `item` with
`item.item()`,
that will convert the NumPy scalar to a Python scalar, which is
quite a
lot more lightweight and faster.  Also it might give PyPy more
chance
to optimize `f` I suppose.
...
The reason I want the second version is that I can then have
sounddevice
start playing `output_array` in a separate thread, while it's
being
calculated. (Yes, I know about the GIL, I believe that
sounddevice
releases
it.)
`np.vectorize` will definitely not release the GIL, this loop may
in
between (I am not sure), but also adds quite a bit of overheads
compared to `vectorize`.  The best thing of course would be if
you can
rewrite `f` to accept an array?
...
Unfortunately, the for loop is very slow, even when I'm not
processing the
data on separate thread. I benchmarked it on both CPython and
PyPy3,
which
is my target platform. On CPython it's 3 times slower than
vectorize,
and
on PyPy3 it's 67 times slower than vectorize! That's despite
the fact
that
the Numpy documentation says "The `vectorize` function is
provided
primarily for convenience, not for performance. The
implementation is
essentially a `for` loop."
PyPy is nice because it makes NumPy just work. Unfortunately,
that also
adds some overheads, so at least some slowdown is probably
expected.  I
am not sure about why it is so much.
I would not be surprised if a list comprehension is not much
faster,
especially on PyPy (assuming you cannot modify `f` to work with
arrays).
...
So here are a few questions:
1. Is there something like `vectorize`, except you get to
access the
output
array before it's finished? If not, what do you think about
adding
that as
an option to `vectorize`?
vectorize should allow an `out=` argument to pass in the output
array,
would that help you?  So you can access it, but I am not sure how
that
will help you.  Although you could create a big result array and
then
access chunks of it:
final_arr = np.empty(...)
   newly_written = slice(0, 1000)
   run_calculation(final_arr[newly_written])
where newly_written is defined by the input chunk you got, I
suppose.
...
2. Is there a more efficient way of writing the `for` loop I've
written
above? Or any other kind of solution to my
As said, the main thing would be to modify `f` in whatever way
possible.  For that it would be useful to know what `f` does
exactly.
Maybe you can move `f` to Cython or numba, or maybe write in a
way that
works on arrays...
...
Thanks for your help,
Ram Rachum.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion