
Why is the second method of converting a list of tuples to an array so much faster?
x = range(500) x = [(z,) for z in x] # <-- e.g. output of a sql database x[:5] [(0,), (1,), (2,), (3,), (4,)]
timeit np.array(x).reshape(-1) # <-- slow 1000 loops, best of 3: 832 us per loop timeit np.array([z[0] for z in x]) 10000 loops, best of 3: 106 us per loop # <-- fast
Is it a fixed overhead advantage? Doesn't seems so:
x = range(50000) x = [[z] for z in x] timeit np.array(x).reshape(-1) 10 loops, best of 3: 83 ms per loop timeit np.array([z[0] for z in x]) 100 loops, best of 3: 9.81 ms per loop
So it is probably faster to make a 1d array and reshape it:
timeit np.array([[1,2], [3,4], [5,6]]) 100000 loops, best of 3: 11.8 us per loop timeit np.array([1,2,3,4,5,6]).reshape(-1,2) 100000 loops, best of 3: 6.62 us per loop
Yep.

On Fri, Feb 5, 2010 at 12:26, Keith Goodman <kwgoodman@gmail.com> wrote:
Why is the second method of converting a list of tuples to an array so much faster?
x = range(500) x = [(z,) for z in x] # <-- e.g. output of a sql database x[:5] [(0,), (1,), (2,), (3,), (4,)]
timeit np.array(x).reshape(-1) # <-- slow 1000 loops, best of 3: 832 us per loop timeit np.array([z[0] for z in x]) 10000 loops, best of 3: 106 us per loop # <-- fast
When array() gets a sequence of sequences, it has to do more work to figure out the appropriate shape. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (2)
-
Keith Goodman
-
Robert Kern