probably doesn't make a difference for this exercise, but numpy arrays make lousy replacements for a regular list -- i.e. as a container alone. The issue is that floats need to be "boxed" and "unboxed" as you put them in and pull them out of an array. whereas with lists, they float objects themselves are already there.
OK, maybe not as bad as I remember. but not great:
In [61]: def multiply(vect, scalar, out):
...: """
...: multiply all the elements in vect by a scalar in place
...: """
...: for i, val in enumerate(vect):
...: out[i] = val * scalar
...:
In [62]: arr = np.random.random((100000,))
In [63]: arrout = np.zeros_like(arr)
In [64]: l = list(arr)
In [65]: lout = [None] * len(l)
In [66]: %timeit multiply(arr, 1.1, arrout)
19.3 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [67]: %timeit multiply(l, 1.1, lout)
12.8 ms ± 83.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
> That said, I have now run my example code using both PYTHONGIL=0 and PYTHONGIL=1 of Sam's nogil branch as well as the following other Python3 versions:
* Conda Python3 (3.9.7)
* /usr/bin/python3 (3.9.1 in my case)
* 3.9 branch tip (3.9.7+)
The results were confusing, so I dredged up a copy of pystone to make sure I wasn't missing anything w.r.t. basic execution performance. I'm still confused, so will keep digging.
I'll be interested to see what you find out :-)
It would also be fun to see David Beezley’s example from his seminal talk:
Thanks, I'll take a look when I get a chance
That may not be the best source of the talk -- just the one I found first :-)
-CHB