Hello, Say I have a few 1d arrays and one 2d array which columns I want to be the 1d arrays. I also want all the a's arrays to share the *same data* with the b array. If I call my 1d arrays a1, a2, etc. and my 2d array b, then b[:,0] = a1[:] b[:,1] = a2[:] ... won't work because apparently copying occurs. I tried it the other way around i.e. a1 = b[:,0] a2 = b[:,1] ... and it works but that doesn't help me for my problem. Is there a way to reformulate the first code snippet above but with shallow copying? Thanks, -- Hugo Gagnon -- Hugo Gagnon
On Sat, 26 Mar 2011 13:10:42 -0400, Hugo Gagnon wrote: [clip]
a1 = b[:,0] a2 = b[:,1] ...
and it works but that doesn't help me for my problem. Is there a way to reformulate the first code snippet above but with shallow copying?
No. You need an 2-D array to "own" the data. The second way is the approach to use if you want to share the data. -- Pauli Virtanen
On 3/26/11 10:12 AM, Pauli Virtanen wrote:
On Sat, 26 Mar 2011 13:10:42 -0400, Hugo Gagnon wrote: [clip]
a1 = b[:,0] a2 = b[:,1] ...
and it works but that doesn't help me for my problem. Is there a way to reformulate the first code snippet above but with shallow copying?
No. You need an 2-D array to "own" the data. The second way is the approach to use if you want to share the data.
exactly -- but to clarify, it's not just about ownership, it's about layout of the data in memory. the data in a numpy array needs to be laid out in memory as one block, with consitent strides from one element to the next, one row to the next, etc. When you create an array from scratch (like your 2-d array here), you get one big block of memory. If you create each row separately, they each have their own block of memory that are unrelated -- there is no way to put those together into one block with consistent strides. So you need to create that big block first (the 2-d array), then you can reference parts of it for each row. See my previous note for a bit more discussion. Oh, and maybe the little presentation and sample code I gave to the Seattle Python Interest group will help: http://www.seapig.org/November2010Notes -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi, I am also interested in this. In my application there is a large 2d array, lets call it 'b' to keep the notation consistent in the thread. b's columns need to be recomputed often. Ideally this re-computation happens in a function. Lets call that function updater(b, col_index): The simplest example is where updater(b, col_index) is a matrix vector multiply, where the matrix or the vector changes. Is there anyway apart from using ufuncs that I can make updater() write the result directly in b and not create a new temporary column that is then copied into b ? Say for the matrix vector multiply example. I can write the matrix vector product in terms of ufuncs but will lose out in terms of speed. In the best case scenario I would like to maintain 'b' in a csr sparse matrix form, as 'b' participates in a matrix vector multiply. I think csr would be asking for too much, but even ccs should help. I dont want to clutter this thread with the sparsity issues though, any solution to the original question or pointers to solutions would be appreciated. Thanks --srean On Sat, Mar 26, 2011 at 12:10 PM, Hugo Gagnon < sourceforge.numpy@user.fastmail.fm> wrote:
Hello,
Say I have a few 1d arrays and one 2d array which columns I want to be the 1d arrays. I also want all the a's arrays to share the *same data* with the b array. If I call my 1d arrays a1, a2, etc. and my 2d array b, then
b[:,0] = a1[:] b[:,1] = a2[:] ...
won't work because apparently copying occurs. I tried it the other way around i.e.
a1 = b[:,0] a2 = b[:,1] ...
and it works but that doesn't help me for my problem. Is there a way to reformulate the first code snippet above but with shallow copying?
Thanks, -- Hugo Gagnon -- Hugo Gagnon
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 3/26/11 10:32 AM, srean wrote:
I am also interested in this. In my application there is a large 2d array, lets call it 'b' to keep the notation consistent in the thread. b's columns need to be recomputed often. Ideally this re-computation happens in a function. Lets call that function updater(b, col_index): The simplest example is where updater(b, col_index) is a matrix vector multiply, where the matrix or the vector changes.
Is there anyway apart from using ufuncs that I can make updater() write the result directly in b and not create a new temporary column that is then copied into b ? Say for the matrix vector multiply example.
Probably not -- the trick is that when an array is a view of a slice of another array, it may not be laid out in memory in a way that other libs (like LAPACK, BLAS, etc) require, so the data needs to be copied to call those routines. To understand all this, you'll need to study up a bit on how numpy arrays lay out and access the memory that they use: they use a concept of "strided" memory. It's very powerful and flexible, but most other numeric libs can't use those same data structures. I"m not sure what a good doc is to read to learn about this -- I learned it from messing with the C API. TAke a look at any docs that talk about "strides", and maybe playing with the "stride tricks" tools will help. A simple example: In [3]: a = np.ones((3,4)) In [4]: a Out[4]: array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]) In [5]: a.flags Out[5]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So a is a (3,4) array, stored in C_contiguous fashion, jsut like a "regular old C array". A lib expecting data in this fashion could use the data pointer just like regular C code. In [6]: a.strides Out[6]: (32, 8) this means is is 32 bytes from the start of one row to the next, and 8 bytes from the start of one element to the next -- which makes sense for a 64bit double. In [7]: b = a[:,1] In [10]: b Out[10]: array([ 1., 1., 1.]) so b is a 1-d array with three elements. In [8]: b.flags Out[8]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False but it is NOT C_Contiguous - the data is laid out differently that a standard C array. In [9]: b.strides Out[9]: (32,) so this means that it is 32 bytes from one element to the next -- for a 8 byte data type. This is because the elements are each one element in a row of the a array -- they are not all next to each other. A regular C library generally won't be able to work with data laid out like this. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi Christopher,
thanks for taking the time to reply at length. I do understand the concept
of striding in general but was not familiar with the Numpy way of accessing
that information. So thanks for pointing me to .flag and .stride.
That said, BLAS/LAPACK do have apis that take the stride length into
account. But for sparse arrays I think its a hopeless situation. That is a
bummer, because sparse is what I need. Oh well, I will probably do it in C++
-- srean
p.s. I hope top posting is not frowned upon here. If so, I will keep that in
mind in my future posts.
On Sat, Mar 26, 2011 at 1:31 PM, Christopher Barker
Probably not -- the trick is that when an array is a view of a slice of another array, it may not be laid out in memory in a way that other libs (like LAPACK, BLAS, etc) require, so the data needs to be copied to call those routines.
To understand all this, you'll need to study up a bit on how numpy arrays lay out and access the memory that they use: they use a concept of "strided" memory. It's very powerful and flexible, but most other numeric libs can't use those same data structures. I"m not sure what a good doc is to read to learn about this -- I learned it from messing with the C API. TAke a look at any docs that talk about "strides", and maybe playing with the "stride tricks" tools will help.
A simple example:
In [3]: a = np.ones((3,4))
In [4]: a Out[4]: array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]])
In [5]: a.flags Out[5]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
So a is a (3,4) array, stored in C_contiguous fashion, jsut like a "regular old C array". A lib expecting data in this fashion could use the data pointer just like regular C code.
In [6]: a.strides Out[6]: (32, 8)
this means is is 32 bytes from the start of one row to the next, and 8 bytes from the start of one element to the next -- which makes sense for a 64bit double.
In [7]: b = a[:,1]
In [10]: b Out[10]: array([ 1., 1., 1.])
so b is a 1-d array with three elements.
In [8]: b.flags Out[8]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
but it is NOT C_Contiguous - the data is laid out differently that a standard C array.
In [9]: b.strides Out[9]: (32,)
so this means that it is 32 bytes from one element to the next -- for a 8 byte data type. This is because the elements are each one element in a row of the a array -- they are not all next to each other. A regular C library generally won't be able to work with data laid out like this.
Den 26.03.2011 19:31, skrev Christopher Barker:
To understand all this, you'll need to study up a bit on how numpy arrays lay out and access the memory that they use: they use a concept of "strided" memory. It's very powerful and flexible, but most other numeric libs can't use those same data structures.
With the ISO C bindings in Fortran 2003, we can get a Fortran pointer from a C pointer. This means it is actually possible to pass "strided memory" from NumPy to Fortran libraries in a portable way. A Fortran pointer is more similar to a NumPy view array than a C pointer. If I am not mistaken, fwrap can generate the boiler-plate Cython and Fortran code. Sturla
On 03/27/2011 08:54 PM, Sturla Molden wrote:
Den 26.03.2011 19:31, skrev Christopher Barker:
To understand all this, you'll need to study up a bit on how numpy arrays lay out and access the memory that they use: they use a concept of "strided" memory. It's very powerful and flexible, but most other numeric libs can't use those same data structures. With the ISO C bindings in Fortran 2003, we can get a Fortran pointer from a C pointer. This means it is actually possible to pass "strided memory" from NumPy to Fortran libraries in a portable way. A Fortran pointer is more similar to a NumPy view array than a C pointer. If I am not mistaken, fwrap can generate the boiler-plate Cython and Fortran code.
fwrap currently ensures that array are Fortran-contiguous. Supporting strided arrays has been in the back of our minds but not much thought has been given it really... What would we do exactly -- pass the entire underlying buffer to Fortran and then re-slice it Fortran side? That's more complicated than doing a copy Python-side, so unless the Fortran compiler actually supports keeping the array strided throughout the code it's not worth it. I'm not aware of the rules here -- basically, if you keep an array assumed-shape, it can be passed around in non-contiguous form in Fortran? And it will be copied whenever it is passed as an explicit-shape array? Dag Sverre
Den 28.03.2011 09:34, skrev Dag Sverre Seljebotn:
What would we do exactly -- pass the entire underlying buffer to Fortran and then re-slice it Fortran side?
Pass a C pointer to the first element along with shape and strides, get a Fortran pointer using c_f_pointer, then reslice the Fortran pointer to a second Fortran pointer. Remember to argsort strides and dimentions on strides in ascending order.
That's more complicated than doing a copy Python-side, so unless the Fortran compiler actually supports keeping the array strided throughout the code it's not worth it. I'm not aware of the rules here -- basically, if you keep an array assumed-shape, it can be passed around in non-contiguous form in Fortran?
The Fortran standard does not specify this. A compiler is free to make a copy (copy-in copy-out), which is why a function call can invalidate a pointer. A compiler can decide to make a local copy, pass a dope array struct, or even remap virtual memory. It might e.g. depend on optimization rules, such as whether to favour speed or size. We cannot instruct a Fortran compiler to use strided memory, but we can allow it to use strided memory if it wants.
And it will be copied whenever it is passed as an explicit-shape array?
That is also compiler dependent. The standard just says that array indexing shall work. There is no requirement that explicit-shape arrays are contiguous. A compiler is free to use strided memory if it wants to here as well. Most compilers will assume that explicit-shape and assumed-size arrays are contiguous and passed as a pointer to the first element. But the standard does not require this, which is why there were no portable way of interfacing C and Fortran prior to Fortran 2003. Only a C array using Fortran 2003 C bindings is assumed contiguous by the standard. Apart from that, the standard does not care about the binary representation. f2py actually depends on knowing implementation details for common Fortran compilers, not a strandard C interface. I seem to remember that some Fortran 77 compilers even used virtual memory remapping to make strided memory to look contiguous to the callee, instead of making a local copy. That would be an alternative to a dope array, while presering a simple interface to C. So the answer is this is messy, the Fortran standard only require indexing to work as expected, and Fortran compilers can do whatever they want to achieve this. Just to repeat myself, "we cannot instruct a Fortran compiler to use strided memory, but we can allow it to use strided memory if it wants." It is indeed easier to make a local contiguous copy when calling Fortran. My experience when working with C is that "copy-in copy-out" is faster for computation than strided memory, particularly when it can fit in CPU cache. So just making a copy can be a good strategy, but it is depending on small array size. Generally I would say that it is a bad idea to try to be smarter than a Fortran compiler when it comes to decisions on array access. Those that make Fortran compilers have spent man years tuning this. So my preference is to just tell Fortran that memory is strided, and let it do whatever it wants. Sturla
On 03/28/2011 12:55 PM, Sturla Molden wrote:
Den 28.03.2011 09:34, skrev Dag Sverre Seljebotn:
What would we do exactly -- pass the entire underlying buffer to Fortran and then re-slice it Fortran side? Pass a C pointer to the first element along with shape and strides, get a Fortran pointer using c_f_pointer, then reslice the Fortran pointer to a second Fortran pointer.
Remember to argsort strides and dimentions on strides in ascending order.
That's more complicated than doing a copy Python-side, so unless the Fortran compiler actually supports keeping the array strided throughout the code it's not worth it. I'm not aware of the rules here -- basically, if you keep an array assumed-shape, it can be passed around in non-contiguous form in Fortran? The Fortran standard does not specify this. A compiler is free to make a copy (copy-in copy-out), which is why a function call can invalidate a pointer. A compiler can decide to make a local copy, pass a dope array struct, or even remap virtual memory. It might e.g. depend on optimization rules, such as whether to favour speed or size. We cannot instruct a Fortran compiler to use strided memory, but we can allow it to use strided memory if it wants.
And it will be copied whenever it is passed as an explicit-shape array? That is also compiler dependent. The standard just says that array indexing shall work. There is no requirement that explicit-shape arrays are contiguous. A compiler is free to use strided memory if it wants to here as well. Most compilers will assume that explicit-shape and assumed-size arrays are contiguous and passed as a pointer to the first element. But the standard does not require this, which is why there were no portable way of interfacing C and Fortran prior to Fortran 2003. Only a C array using Fortran 2003 C bindings is assumed contiguous by the standard. Apart from that, the standard does not care about the binary representation.
f2py actually depends on knowing implementation details for common Fortran compilers, not a strandard C interface.
I seem to remember that some Fortran 77 compilers even used virtual memory remapping to make strided memory to look contiguous to the callee, instead of making a local copy. That would be an alternative to a dope array, while presering a simple interface to C.
So the answer is this is messy, the Fortran standard only require indexing to work as expected, and Fortran compilers can do whatever they want to achieve this. Just to repeat myself, "we cannot instruct a Fortran compiler to use strided memory, but we can allow it to use strided memory if it wants."
It is indeed easier to make a local contiguous copy when calling Fortran. My experience when working with C is that "copy-in copy-out" is faster for computation than strided memory, particularly when it can fit in CPU cache. So just making a copy can be a good strategy, but it is depending on small array size. Generally I would say that it is a bad idea to try to be smarter than a Fortran compiler when it comes to decisions on array access. Those that make Fortran compilers have spent man years tuning this. So my preference is to just tell Fortran that memory is strided, and let it do whatever it wants.
Sure, I realize that it is not standard. I'm mostly wondering whether major Fortran compilers support working with strided memory in practice (defined as you won't get out-of-memory-errors when passing around huge strided array subset). If no compilers actually support it in practice, the priority for this functionality in Fwrap can be put fairly low. I totally agree it's the right thing to do, it's a question of priorities. Dag Sverre
Den 28.03.2011 14:28, skrev Dag Sverre Seljebotn:
Sure, I realize that it is not standard. I'm mostly wondering whether major Fortran compilers support working with strided memory in practice (defined as you won't get out-of-memory-errors when passing around huge strided array subset).
I'll try to clarify this: ** Most Fortran 90 compilers (and beyond) supports strided memory for assumed-shape and deferred-shape arrays. That is, arrays declared like a(:,:), and/or with 'allocatable' or 'pointer' attribute. They are usually passed as a 'dope array' C struct, i.e. similar to NumPy's PyArrayObject. These arrays are not a part of Fortran 77, but passing them to Fortran 77 subroutines is required to work. The compiler is free to make a local copy if it wants. ** Most Fortran 77 compilers (and beyond) assume explicit-shape and assumed-size arrays are contiguous blocks of memory. That is, arrays declared like a(m,n) or a(m,*). They are usually passed as a pointer to the first element. These are the only type of Fortran arrays f2py supports. ** Most Fortran compilers will make a temporary copy when passing a non-contiguous array section to a subroutine expecting an explicit-shape or assumed-shape array. ** All Fortran 2003 compilers will assume a C array is contiguous. Ok? Sturla
On Mon, Mar 28, 2011 at 6:01 PM, Sturla Molden
I'll try to clarify this:
** Most Fortran 77 compilers (and beyond) assume explicit-shape and assumed-size arrays are contiguous blocks of memory. That is, arrays declared like a(m,n) or a(m,*). They are usually passed as a pointer to the first element. These are the only type of Fortran arrays f2py supports.
FYI, f2py in numpy 1.6.x supports also assumed shape arrays. Pearu
On Mon, Mar 28, 2011 at 10:44 PM, Sturla Molden
Den 28.03.2011 19:12, skrev Pearu Peterson:
FYI, f2py in numpy 1.6.x supports also assumed shape arrays.
How did you do that? Chasm-interop, C bindings from F03, or marshalling through explicit-shape?
The latter. Basically, if you have subroutine foo(a) real a(:) end then f2py automatically creates a wrapper subroutine subroutine wrapfoo(a, n) real a(n) integer n !f2py intent(in) :: a !f2py intent(hide) :: n = shape(a,0) interface subroutine foo(a) real a(:) end end interface call foo(a) end that can be wrapped with f2py in ordinary way.
Can f2py pass strided memory from NumPy to Fortran?
No. I haven't thought about it. Pearu
On Tue, Mar 29, 2011 at 8:13 AM, Pearu Peterson
On Mon, Mar 28, 2011 at 10:44 PM, Sturla Molden
wrote: Den 28.03.2011 19:12, skrev Pearu Peterson:
FYI, f2py in numpy 1.6.x supports also assumed shape arrays.
How did you do that? Chasm-interop, C bindings from F03, or marshalling through explicit-shape?
The latter. Basically, if you have
subroutine foo(a) real a(:) end
then f2py automatically creates a wrapper subroutine
subroutine wrapfoo(a, n) real a(n) integer n !f2py intent(in) :: a !f2py intent(hide) :: n = shape(a,0) interface subroutine foo(a) real a(:) end end interface call foo(a) end
that can be wrapped with f2py in ordinary way.
Can f2py pass strided memory from NumPy to Fortran?
No. I haven't thought about it.
Now, after little bit of thinking and testing, I think supporting strided arrays in f2py is easily doable. For the example above, f2py just must generate the following wrapper subroutine subroutine wrapfoo(a, stride, n) real a(n) integer n, stride !f2py intent(in) :: a !f2py intent(hide) :: n = shape(a,0) !f2py intent(hide) :: stride = getstrideof(a) interface subroutine foo(a) real a(:) end end interface call foo(a(1:stride:n)) end Now the question is, how important this feature would be? How high I should put it in my todo list? If there is interest, the corresponding numpy ticket should be assigned to me. Best regards, Pearu
On 03/29/2011 09:35 AM, Pearu Peterson wrote:
On Tue, Mar 29, 2011 at 8:13 AM, Pearu Peterson
mailto:pearu.peterson@gmail.com> wrote: On Mon, Mar 28, 2011 at 10:44 PM, Sturla Molden
mailto:sturla@molden.no> wrote: Den 28.03.2011 19:12, skrev Pearu Peterson: > > FYI, f2py in numpy 1.6.x supports also assumed shape arrays.
How did you do that? Chasm-interop, C bindings from F03, or marshalling through explicit-shape?
The latter. Basically, if you have
subroutine foo(a) real a(:) end
then f2py automatically creates a wrapper subroutine
subroutine wrapfoo(a, n) real a(n) integer n !f2py intent(in) :: a !f2py intent(hide) :: n = shape(a,0) interface subroutine foo(a) real a(:) end end interface call foo(a) end
that can be wrapped with f2py in ordinary way.
Can f2py pass strided memory from NumPy to Fortran?
No. I haven't thought about it.
Now, after little bit of thinking and testing, I think supporting strided arrays in f2py is easily doable. For the example above, f2py just must generate the following wrapper subroutine
subroutine wrapfoo(a, stride, n) real a(n) integer n, stride !f2py intent(in) :: a !f2py intent(hide) :: n = shape(a,0) !f2py intent(hide) :: stride = getstrideof(a) interface subroutine foo(a) real a(:) end end interface call foo(a(1:stride:n)) end
I think it should be a(1:n*stride:stride) or something. Dag Sverre
Den 28.03.2011 14:28, skrev Dag Sverre Seljebotn:
Sure, I realize that it is not standard. I'm mostly wondering whether major Fortran compilers support working with strided memory in practice (defined as you won't get out-of-memory-errors when passing around huge strided array subset).
On 64-bit systems one could also use virtual memory (re)mapping for this, i.e. a contiguous block of virtual memory could alias a strided block of virtual memory, avoiding a local copy. That could be convinient for passing huge arrays to Fortran 77 subroutines, e.g. LAPACK, for which most implementations require arrays to be contiguous. Sturla
On Sat, 26 Mar 2011 12:32:24 -0500, srean wrote: [clip]
Is there anyway apart from using ufuncs that I can make updater() write the result directly in b and not create a new temporary column that is then copied into b? Say for the matrix vector multiply example. I can write the matrix vector product in terms of ufuncs but will lose out in terms of speed.
Well, you can e.g. write def updater(b, col_idx): b[:,col_idx] *= 3 # <- modifies b[:,col_idx] in place And ditto for sparse matrices --- but maybe this is not what you asked. If you want to have control over temporaries, you can make use of the out= argument of ufuncs (`numpy.dot` will gain it in 1.6.1 --- you can call LAPACK routines from scipy.lib in the meantime, if your data is in Fortran order). Also numexpr is probably able to write the output directly to a given array --- using it is an alternative way to avoid temporaries, and probably easier to write than doing things via the out= arguments. For sparse matrices, things then depend on how they are laid out in memory. You can probably alter the `.data` attribute of the arrays directly, if you know how the underlying representation works. -- Pauli Virtanen
On Sat, 26 Mar 2011 19:13:43 +0000, Pauli Virtanen wrote: [clip]
If you want to have control over temporaries, you can make use of the out= argument of ufuncs (`numpy.dot` will gain it in 1.6.1 --- you can call LAPACK routines from scipy.lib in the meantime, if your data is in Fortran order).
Like so: # Fortran-order for efficient DGEMM -- each column must be contiguous A = np.random.randn(4,4).copy('F') b = np.random.randn(4,10).copy('F') def updater(b, col_idx): # This will work in Numpy 1.6.1 dot(A, b[:,col_idx].copy(), out=b[:,col_idx]) In the meantime you can do A = np.random.randn(4,4).copy('F') b = np.random.randn(4,10).copy('F') from scipy.lib.blas import get_blas_funcs gemm, = get_blas_funcs(['gemm'], [A, b]) # get correct type func def updater(b, col_idx): bcol = b[:,col_idx] c = gemm(1.0, A, bcol.copy(), 0.0, bcol, overwrite_c=True) assert c is bcol # check that it didn't make copies! Note that DGEMM and `dot` cannot do in-place multiplication -- at least the BLAS library I have fails when the B and C arguments point to the same memory, so you'll anyway end up with one temporary. (This has nothing to do with Scipy -- same occurs in Fortran). -- Pauli Virtanen
Ah! very nice. I did not know that numpy-1.6.1 supports in place 'dot', and
neither the fact that you could access the underlying BLAS functions like
so. This is pretty neat. Thanks. Now I at least have an idea how the sparse
version might work.
If I get time I will probably give numpy-1.6.1 a shot. I already have the
MKL libraries thanks to free version of epd for students.
On Sat, Mar 26, 2011 at 2:34 PM, Pauli Virtanen
Like so:
# Fortran-order for efficient DGEMM -- each column must be contiguous A = np.random.randn(4,4).copy('F') b = np.random.randn(4,10).copy('F')
def updater(b, col_idx): # This will work in Numpy 1.6.1 dot(A, b[:,col_idx].copy(), out=b[:,col_idx])
In the meantime you can do
A = np.random.randn(4,4).copy('F') b = np.random.randn(4,10).copy('F')
from scipy.lib.blas import get_blas_funcs gemm, = get_blas_funcs(['gemm'], [A, b]) # get correct type func
def updater(b, col_idx): bcol = b[:,col_idx] c = gemm(1.0, A, bcol.copy(), 0.0, bcol, overwrite_c=True) assert c is bcol # check that it didn't make copies!
participants (7)
-
Christopher Barker
-
Dag Sverre Seljebotn
-
Hugo Gagnon
-
Pauli Virtanen
-
Pearu Peterson
-
srean
-
Sturla Molden