fastest way to make two vectors into an array
I have two equal length 1D arrays of 2564096 complex or floating point numbers which I need to put into a shape=(len(x),2) array. I need to do this a lot, so I would like to use the most efficient means. Currently I am doing: def somefunc(x,y): X = zeros( (len(x),2), typecode=x.typecode()) X[:,0] = x X[:,1] = y do_something_with(X) Is this the fastest way? Thanks, John Hunter
* John Hunter [20030129 22:13]:
def somefunc(x,y): X = zeros( (len(x),2), typecode=x.typecode()) X[:,0] = x X[:,1] = y do_something_with(X)
Is this the fastest way?
X = transpose(array([x]+[y])) It may not be the fastest possible way, but should be about a factor of two faster; better than nothing. Cheers, Joachim
On Wed, 29 Jan 2003, John Hunter wrote:
I have two equal length 1D arrays of 2564096 complex or floating point numbers which I need to put into a shape=(len(x),2) array.
I need to do this a lot, so I would like to use the most efficient means. Currently I am doing:
def somefunc(x,y): X = zeros( (len(x),2), typecode=x.typecode()) X[:,0] = x X[:,1] = y do_something_with(X)
Is this the fastest way?
May be you could arange your algorithm so that you first create X and then reference its columns by x,y without copying: # Allocate memory X = zeros( (n,2), typecode=.. ) # Get references to columns x = X[:,0] y = X[:,1] while 1: do_something_inplace_with(x,y) do_something_with(X) Pearu
"John" == John Hunter
writes:
John> I have two equal length 1D arrays of 2564096 complex or John> floating point numbers which I need to put into a John> shape=(len(x),2) array. John> I need to do this a lot, so I would like to use the most John> efficient means. Currently I am doing: I tested all the suggested methods and the transpose with [x] and [y] was the clear winner, with an 8 fold speed up over my original code. The concatenate method was between 23 times faster. Thanks to all who responded, John Hunter cruncher2:~/python/test> python test.py test_naive test_naive 0.480427026749 cruncher2:~/python/test> python test.py test_concat test_concat 0.189149975777 cruncher2:~/python/test> python test.py test_transpose test_transpose 0.0698409080505 from Numeric import transpose, concatenate, reshape, array, zeros from RandomArray import normal import time, sys def test_naive(x,y): "Naive approach" X = zeros( (len(x),2), typecode=x.typecode()) X[:,0] = x X[:,1] = y def test_concat(x,y): "Thanks to Chris Barker and Bryan Cole" X = concatenate( ( reshape(x,(1,1)), reshape(y,(1,1)) ), 1) def test_transpose(x,y): "Thanks to Joachim Saul" X = transpose(array([x]+[y])) m = {'test_naive' : test_naive, 'test_concat' : test_concat, 'test_transpose' : test_transpose} nse1 = normal(0.0, 1.0, (4096,)) nse2 = normal(0.0, 1.0, nse1.shape) N = 1000 trials = range(N) func = m[sys.argv[1]] t1 = time.time() for i in trials: func(nse1,nse2) t2 = time.time() print sys.argv[1], t2t1
John Hunter wrote:
John> I have two equal length 1D arrays of 2564096 complex or John> floating point numbers which I need to put into a John> shape=(len(x),2) array.
I tested all the suggested methods and the transpose with [x] and [y] was the clear winner, with an 8 fold speed up over my original code. The concatenate method was between 23 times faster.
I was a little surprised by this, as I figured that the transpose method made an extra copy of the data (array() makes one copy, transpose() another. So I looked at the source for concatenate: def concatenate(a, axis=0): """concatenate(a, axis=0) joins the tuple of sequences in a into a single NumPy array. """ if axis == 0: return multiarray.concatenate(a) else: new_list = [] for m in a: new_list.append(swapaxes(m, axis, 0)) return swapaxes(multiarray.concatenate(new_list), axis, 0) So, if you are concantenating along anything other than the zeroth axis, you end up doing something similar to the transpose method. Seeign this, I trioed something else: def test_concat2(x,y): x.shape = (1,1) y.shape = (1,1) X = transpose( concatenate( (x, y) ) ) x.shape = (1,) y.shape = (1,) This then uses the native concatenate, but requires an extra copy in teh transpose. Here's a somewhat cleaner version, though you get more copies: def test_concat3(x,y): "Thanks to Chris Barker and Bryan Cole" X = transpose( concatenate( ( reshape(x,(1,1)), reshape(y,(1,1)) ) ) ) Here are the test results: testing on vectors of length: 4096 test_concat 0.286280035973 test_transpose 0.100033998489 test_naive 0.805399060249 test_concat3 0.109319090843 test_concat2 0.136469960213 All the transpose methods are essentially a tie. Would it be that hard for concatenate to do it's thing for any axis in C? It does seem like this is a fairly basic operation, and shouldn't require more than one copy. By the way, I realised that the transpose method had an extra call. transpose() can take an approprriate python sequence, so this works just fine: def test_transpose2(x,y): X = transpose([x]+[y]) However, it doesn't really save you the copy, as I'm retty sure transpose makes a copy internally anyway. Test results: testing on vectors of length: 4096 test_transpose 0.104995965958 test_transpose2 0.103582024574 I think the winner is: X = transpose([x]+[y]) well, I learned a little bit more about Numeric today. Chris  Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
Chris Barker wrote:
X = transpose([x]+[y])
well, I learned a little bit more about Numeric today.
I've been skipping through a lot of messages today because I was getting behind on mailing list traffic, but I missed one thing in the discussion so far (sorry if it was marked already): transpose doesn't actually do any work. Actually, transpose only sets the "strides" counts differently, and this is blazingly fast. What is NOT fast is using the transposed array later! The problem is that many routines actually require a contiguous array, and will make a temporary local contiguous copy. This may happen multiple times if the lifetime of the transposed array is long. Even routines that do not require a contiguous array and can actually use the strides may run significantly slower because the CPU cache is trashed a lot by the high strides. Moral: you can't test this code by looping a 1000 times through it, you actually should take into account the time it takes to make a contiguous array immediately after the transpose call. Regards, Rob Hooft  Rob W.W. Hooft  rob@hooft.net  http://www.hooft.net/people/rob/
participants (5)

Chris Barker

Joachim Saul

John Hunter

Pearu Peterson

Rob Hooft