Paulo J. S. Silva wrote:
Em Qua, 2006-01-18 às 11:15 -0700, Travis Oliphant escreveu:
Will you run these again with the latest SVN version of numpy. I couldn't figure out why a copy was being made on transpose (because it shouldn't have been). Then, I dug deep into the PyArray_FromAny code and found bad logic in when a copy was needed that was causing an inappropriate copy.
I fixed that and now wonder how things will change. Because presumably, the dotblas function should handle the situation now...
Good work Travis :-)
Tests x.T*y x*y.T A*x A*B A.T*x half 2in2
Dimension: 5 Array 0.9000 0.2400 0.2000 0.2600 0.7100 0.9400 1.1600 Matrix 4.7800 1.5700 0.6200 0.7600 1.0600 3.0400 4.6500 NumArr 3.2900 0.7400 0.6800 0.7800 8.4800 7.4200 11.6600 Numeri 1.3300 0.3900 0.3100 0.4200 0.7900 0.6800 0.7600 Matlab 1.88 0.44 0.41 0.35 0.37 1.20 0.98
Dimension: 50 Array 9.0000 2.1400 0.5500 18.9500 1.4100 4.2700 4.4500 Matrix 48.7400 3.9200 1.0100 20.2000 1.8000 6.5000 8.1900 NumArr 32.3900 2.6800 1.0000 18.9700 13.0300 8.6300 13.0700 Numeri 13.1000 2.2600 0.6500 18.2700 10.1500 1.0400 3.2600 Matlab 16.98 1.94 1.07 17.86 0.73 1.57 1.77
Dimension: 500 Array 1.1400 9.2300 2.0100 168.2700 2.1800 4.0200 4.2900 Matrix 5.0300 9.3500 2.1500 167.5300 2.1700 4.1100 4.4200 NumArr 3.4400 9.1000 2.1000 168.7100 21.8400 4.3900 5.8900 Numeri 1.5800 9.2700 2.0700 167.5600 20.0500 3.4000 4.6800 Matlab 2.09 6.07 2.17 169.45 2.10 2.56 3.06
Note the 10-fold speed-up for higher dimensions :-)
It looks like that now that numpy only looses to matlab in small dimensions. Probably, the problem is the creation of the object to represent the transposed object. Probably Matlab creation of objects is very lightweight (they only have matrices objects to deal with). Probably this phenomenon explains the behavior for the indexing operations too.
Paulo
Here's an idea Fernando and I have briefly talked about off-list, but which perhaps bears talking about here: Is there speed to be gained by an alternative, very simple, very optimized ndarray constructor? The idea would be a special-case constructor with very limited functionality designed purely for speed. It wouldn't support (m)any of the fantastic things Travis has done, but would be useful only in specialized use cases, such as creating indices. I'm not familiar enough with what the normal constructor does to know if we could implement something, (in C, perhaps) that would do nothing but create a simple, contiguous array significantly faster than what is currently done. Or does the current constructor create a new instance about as fast as possible? I know Travis has optimized it, but it's a general purpose constructor, and I'm thinking these extra features may take some extra CPU cycles. Cheers! Andrew