[Numpy-discussion] Good way to develop numpy as popular choice!

Fri Jun 22 12:12:56 EDT 2012

Hi,

On Fri, Jun 22, 2012 at 6:05 PM, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant <travis at continuum.io>wrote:
>
>> Accessing individual elements of NumPy arrays is slower than accessing
>> individual elements of lists --- around 2.5x-3x slower.    NumPy has to do
>> more work to figure out what kind of indexing you are trying to do because
>> of its flexibility.   It also has to create the Python object to return.
>>  In contrast, the list approach already has the Python objects created and
>> you are just returning pointers to them and there is much less flexibility
>> in the kinds of indexing you can do.
>>
>> Simple timings show that a.item(i,j) is about 2x slower than list element
>> access (but faster than a[i,j] which is about 2.5x to 3x slower).   The
>> slowness of a.item is due to the need to create the Python object to return
>> (there are just raw bytes there) so it gives some idea of the relative cost
>> of each part of the slowness of a[i,j].
>>
>> Also, math on the array scalars returned from NumPy will be slower than
>> math on integers and floats --- because NumPy re-uses the ufunc machinery
>> which is not optimized at all for scalars.
>>
>> The take-away is that NumPy is built for doing vectorized operations on
>> bytes of data.   It is not optimized for doing element-by-element
>> individual access.    The right approach there is to just use lists (or use
>> a version specialized for the kind of data in the lists that removes the
>> boxing and unboxing).
>>
>> Here are my timings using IPython for NumPy indexing:
>>
>> 1-D:
>>
>> In[2]: a = arange(100)
>>
>> In [3]: %timeit [a.item(i) for i in xrange(100)]
>> 10000 loops, best of 3: 25.6 us per loop
>>
>> In [4]: %timeit [a[i] for i in xrange(100)]
>> 10000 loops, best of 3: 31.8 us per loop
>>
>> In [5]: al = a.tolist()
>>
>> In [6]: %timeit [al[i] for i in xrange(100)]
>> 100000 loops, best of 3: 10.6 us per loop
>>
>>
>>
>> 2-D:
>>
>> In [7]: a = arange(100).reshape(10,10)
>>
>> In [8]: al = a.tolist()
>>
>> In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)]
>> 10000 loops, best of 3: 18.6 us per loop
>>
>> In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)]
>> 10000 loops, best of 3: 44.4 us per loop
>>
>> In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)]
>> 10000 loops, best of 3: 34.2 us per loop
>>
>>
>>
>> -Travis
>>
>>
> However, what is the timing/memory cost of converting a large numpy array
> that already exists into python list of lists?  If all my processing before
> the munkres step is using NumPy, converting it into python lists has a
> cost.  Also, your timings indicate only ~2x slowdown, while the timing
> tests done by eat show an order-of-magnitude difference.  I suspect there
> is great room for improvement before even starting to worry about the array
> access issues.
>
To create list of list from array is quite fast, like
In []: A= rand(500, 500)
In []: %timeit A.tolist()
100 loops, best of 3: 10.8 ms per loop

Regards,
-eat

>
> Cheers!
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120622/888441a5/attachment.html>