Luis Pedro Coelho wrote:
Hello all,
Numpy arrays come with several reduce operations: sum(), std(), argmin(), min(), ....
The traditional implementation of these suffers from two big problems: It is slow and it often allocates intermediate memory. I have code that is failing with OOM (out of memory) exceptions in calls to ndarray.std(). I regularly handle arrays with 100 million entries (have a couple of million objects * 20 features per object = 100 million doubles), so this is a real problem for me.
Luis, Thank you for your work and your enthusiasm. You are absolutely right that the default implementations are generic and therefore potentially slower and more memory consuming. There is generally a basic tradeoff between generic code and fast/memory-conserving code. The default implementations of std and var are much different than sum, min, argmin, etc. The main difference is that the latter are direct reduce methods on the ufuncs while the former are generic extensions using "python written with the Python C-API." Your approach using C++ templates is interesting, and I'm very glad for your explanation and your releasing of the code as open source. I'm not prepared to start using C++ in NumPy, however, so your code will have to serve as an example only. One way to do this without using templates is to extend the dtype functions array with additional function pointers (std, and var). This has been done several times in the past and it is probably advisable. In that case your code could very likely be used (using the C-compatible portions). I'm grateful you are willing to re-license that part of your code as BSD so that it can possibly be used in NumPy. Thanks so much. It is exciting to see interest in the code especially from students. Best regards, -Travis P.S. Specific answers to some of your questions below.
I am not correctly implementing the dtype parameter. I thought it controlled the type of the intermediate results, but if I do
The dtype parameter controls the type of the "reduction" but not the final result (which is always a float for the mean because of the division).
I see three possible paths for this: (1) You agree that this is nice, it achieves good results and I should strive to integrate this into numpy proper, replacing the current implementation. I would start a branch in numpy svn and implement it there and finally merge into the main branch. I would be perfectly happy to relicense to BSD if this is the case.
The idea of your code is great, but the C++ implementation cannot be directly used.
One could think of the array-to-array (A+B, A+=2, A != B,...) operations using a similar scheme This would imply that part of the C API would be obsoleted: the whole functions array would stop making sense. I don't know how widespread it's used.
I don't know what you mean by the "functions array". If you are talking about the ufuncs, then yes it is widespread.