the problem with this approach is that we don't really want numpy
hogging on to hundreds of megabytes of memory by default so it would
need to be a user option.
indeed -- but one could set an LRU cache to be very small (few items, not small memory), and then it get used within expressions, but not hold on to much outside of expressions.
However, is the allocation the only (Or even biggest) source of the performance hit?
If you generate a temporary as a result of an operation, rather than doing it in-place, that temporary needs to be allocated, but it also means that an additional array needs to be pushed through the processor -- and that can make a big performance difference too.
I"m not entirely sure how to profile this correctly, but this seems to indicate that the allocation is cheap compared to the operations (for a million--element array)
* Regular old temporary creation
In [24]: def f1(arr1, arr2):
...: result = arr1 + arr2 ...: return result
In [26]: %timeit f1(arr1, arr2) 1000 loops, best of 3: 1.13 ms per loop
* Completely in-place, no allocation of an extra array
In [28]: %timeit f2(arr1, arr2) 1000 loops, best of 3: 755 µs per loop
So that's about 30% faster
* allocate a temporary that isn't used -- but should catch the creation cost
In [29]: def f3(arr1, arr2): ...: result = np.empty_like(arr1) ...: arr1 += arr2 ...: return arr1
In [30]: % timeit f3(arr1, arr2)
1000 loops, best of 3: 756 µs per loop
only a µs slower!
Profiling is hard, and I'm not good at it, but this seems to indicate that the allocation is cheap.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception