inplace dot products

Hi numpist people, I discovered the ufunc and there ability to compute the results on preallocated arrays:
a = arange(10, dtype=float32) b = arange(10, dtype=float32) + 1 c = add(a, b, a) c is a True a array([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.], dtype=float32)
My questions is : is there a way to have an equivalent for the dot product operation: I want atlas to build my dot products without allocating a temporary array and reuse a preallocated array of results. Suppose I have:
results = array((10, 3), dtype=float32) W = arange(6, dtype=float32).reshape((2, 3)) x = arange(20, dtype=float32).reshape((10, 2))
What I want is the equivalent of the following without the intermediate call to malloc:
results[:] = dot(x, W)
Any idea? I tried to introspect the various docstring of numpy core modules but I could not get any lead. -- Olivier

Hi Olivier, There was this idea posted on the Scipy-user list a while back: http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html but it doesn't look like he got anywhere with it, or even got a response. I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy. subroutine < tchar=s,d,c,z>gemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C <- alpha * op(A) * op(B) + beta * C I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype. I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know what that argument is :) David On 18-Feb-09, at 4:06 AM, Olivier Grisel wrote:
Hi numpist people,
I discovered the ufunc and there ability to compute the results on preallocated arrays:
a = arange(10, dtype=float32) b = arange(10, dtype=float32) + 1 c = add(a, b, a) c is a True a array([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.], dtype=float32)
My questions is : is there a way to have an equivalent for the dot product operation: I want atlas to build my dot products without allocating a temporary array and reuse a preallocated array of results. Suppose I have:
results = array((10, 3), dtype=float32) W = arange(6, dtype=float32).reshape((2, 3)) x = arange(20, dtype=float32).reshape((10, 2))
What I want is the equivalent of the following without the intermediate call to malloc:
results[:] = dot(x, W)
Any idea? I tried to introspect the various docstring of numpy core modules but I could not get any lead.
-- Olivier _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

2009/2/20 David Warde-Farley <dwf@cs.toronto.edu>:
Hi Olivier,
There was this idea posted on the Scipy-user list a while back:
http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html
but it doesn't look like he got anywhere with it, or even got a response.
I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy.
subroutine < tchar=s,d,c,z>gemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C <- alpha * op(A) * op(B) + beta * C
I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype.
I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know what that argument is :)
Alright, thanks for the reply. Is there a canonical way /sample code to gain low level access to blas / lapack atlas routines using ctypes from numpy / scipy code? I don't mind fixing the dimensions and the ndtype of my array if it can decrease the memory overhead. BTW, Robert if your insight on this topic would be very much appreciated. -- Olivier

Olivier Grisel wrote:
2009/2/20 David Warde-Farley <dwf@cs.toronto.edu>:
Hi Olivier,
There was this idea posted on the Scipy-user list a while back:
http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html
but it doesn't look like he got anywhere with it, or even got a response.
I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy.
subroutine < tchar=s,d,c,z>gemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C <- alpha * op(A) * op(B) + beta * C
I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype.
I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know what that argument is :)
Alright, thanks for the reply.
Is there a canonical way /sample code to gain low level access to blas / lapack atlas routines using ctypes from numpy / scipy code?
You can just use ctypes to access ATLAS, as you would do for any library. Or do you mean something else ? cheers, David

On 20-Feb-09, at 6:39 AM, David Cournapeau wrote:
You can just use ctypes to access ATLAS, as you would do for any library. Or do you mean something else ?
Say, David... :) Do you have any idea why the pyf wrapper for fblas3 completely ignores the overwrite_c argument? Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something. Regards, (Other) David

On Fri, Feb 20, 2009 at 06:25, David Warde-Farley <dwf@cs.toronto.edu> wrote:
On 20-Feb-09, at 6:39 AM, David Cournapeau wrote:
You can just use ctypes to access ATLAS, as you would do for any library. Or do you mean something else ?
Say, David... :)
Do you have any idea why the pyf wrapper for fblas3 completely ignores the overwrite_c argument?
I believe the "copy" is the culprit. double precision dimension(m,n),intent(in,out,copy),depend(m,n),optional :: c
Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something.
Examples? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 20-Feb-09, at 10:39 AM, Robert Kern wrote:
Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something.
Examples?
scipy.lib.lapack.flapack.dpotri, for example. I'm not sure of the proper usage, but when I pass it an identity matrix, depending on whether overwrite_c is True or not, the memory pointed to by the variable gets overwritten. David

On 20-Feb-09, at 10:39 AM, Robert Kern wrote:
Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something.
Examples?
An even better example is scipy.linalg.fblas.dgemv, the matrix-vector equivalent of dgemm. overwrite_y behaves correctly there. David

On 20-Feb-09, at 6:41 AM, Olivier Grisel wrote:
Alright, thanks for the reply.
Is there a canonical way /sample code to gain low level access to blas / lapack atlas routines using ctypes from numpy / scipy code?
I don't mind fixing the dimensions and the ndtype of my array if it can decrease the memory overhead.
I got some clarification from Pearu Peterson off-list. For gemm the issue is that if the matrix C is not Fortran-ordered, it will be copied, and that copy will be over-written. order='F' when creating the array being overwritten will fix this. DWF

On Fri, Feb 20, 2009 at 05:18, David Warde-Farley <dwf@cs.toronto.edu> wrote:
Hi Olivier,
There was this idea posted on the Scipy-user list a while back:
http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html
but it doesn't look like he got anywhere with it, or even got a response.
I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy.
subroutine < tchar=s,d,c,z>gemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C <- alpha * op(A) * op(B) + beta * C
I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype.
What do you mean? Based on the rest of the argument information in that block, f2py creates the Python prototype. For example, all of the m,n,k,lda,ka,ldb,kb dimensions are found from the input arrays themselves, optional arguments are given defaults, etc.
I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know
I believe I was only talking about why it would be hard to use a higher-precision accumulator. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (4)
-
David Cournapeau
-
David Warde-Farley
-
Olivier Grisel
-
Robert Kern