[Numpy-discussion] List comprehension and loops performances with NumPy arrays

Chris Barker - NOAA Federal chris.barker at noaa.gov
Wed Oct 11 00:51:31 EDT 2017


Andrea,

One note: transposing is almost free — it just rearranges the strides —
I.e. changed how the array is interpreted. It doesn’t actually move the
data around.

-CHB

Sent from my iPhone

On Oct 7, 2017, at 2:58 AM, Andrea Gavana <andrea.gavana at gmail.com> wrote:

Apologies, correct timeit code this time (I had gotten the wrong shape for
the output matrix in the loop case):

if __name__ == '__main__':

    repeat = 1000
    items = [Item('item_%d'%(i+1)) for i in xrange(500)]

    output = numpy.asarray([item.do_something() for item in items]).T
    statements = ['''
                  output = numpy.asarray([item.do_something() for item in
items]).T
                  ''',
                  '''
                  output = numpy.empty((8, 500))
                  for i, item in enumerate(items):
                      output[:, i] = item.do_something()
                  ''']

    methods = ['List Comprehension', 'Empty plus Loop   ']
    setup  = 'from __main__ import numpy, items'

    for stmnt, method in zip(statements, methods):

        elapsed = timeit.repeat(stmnt, setup=setup, number=1, repeat=repeat)
        minv, maxv, meanv = min(elapsed), max(elapsed), numpy.mean(elapsed)
        elapsed.sort()
        best_of_3 = numpy.mean(elapsed[0:3])
        result = numpy.asarray((minv, maxv, meanv, best_of_3))*repeat

        print method, ': MIN: %0.2f ms , MAX: %0.2f ms , MEAN: %0.2f ms ,
BEST OF 3: %0.2f ms'%tuple(result.tolist())


Results are the same as before...



On 7 October 2017 at 11:52, Andrea Gavana <andrea.gavana at gmail.com> wrote:

> Hi All,
>
>     I have this little snippet of code:
>
> import timeit
> import numpy
>
> class Item(object):
>
>     def __init__(self, name):
>
>         self.name = name
>         self.values = numpy.random.rand(8, 1)
>
>     def do_something(self):
>
>         sv = self.values.sum(axis=0)
>         array = numpy.empty((8, ))
>         f = numpy.dot(0.5*numpy.ones((8, )), self.values)[0]
>         array.fill(f)
>         return array
>
>
> In my real application, the method do_something does a bit more than that,
> but I believe the snippet is enough to start playing with it. What I have
> is a list of (on average) 500-1,000 classes Item, and I am trying to
> retrieve the output of do_something for each of them in a single, big 2D
> numpy array.
>
> My current approach is to use list comprehension like this:
>
> output = numpy.asarray([item.do_something() for item in items]).T
>
> (Note: I need the transposed of that 2D array, always).
>
> But then I though: why not preallocating the output array and make a
> simple loop:
>
> output = numpy.empty((500, 8))
> for i, item in enumerate(items):
>     output[i, :] = item.do_something()
>
>
> I was expecting this version to be marginally faster - as the previous one
> has to call asarray and then transpose the matrix, but I was in for a
> surprise:
>
> if __name__ == '__main__':
>
>     repeat = 1000
>     items = [Item('item_%d'%(i+1)) for i in xrange(500)]
>
>     statements = ['''
>                   output = numpy.asarray([item.do_something() for item in
> items]).T
>                   ''',
>                   '''
>                   output = numpy.empty((500, 8))
>                   for i, item in enumerate(items):
>                       output[i, :] = item.do_something()
>                   ''']
>
>     methods = ['List Comprehension', 'Empty plus Loop   ']
>
>     setup  = 'from __main__ import numpy, items'
>
>     for stmnt, method in zip(statements, methods):
>
>         elapsed = timeit.repeat(stmnt, setup=setup, number=1,
> repeat=repeat)
>         minv, maxv, meanv = min(elapsed), max(elapsed), numpy.mean(elapsed)
>         elapsed.sort()
>         best_of_3 = numpy.mean(elapsed[0:3])
>         result = numpy.asarray((minv, maxv, meanv, best_of_3))*repeat
>
>         print method, ': MIN: %0.2f ms , MAX: %0.2f ms , MEAN: %0.2f ms ,
> BEST OF 3: %0.2f ms'%tuple(result.tolist())
>
>
> I get this:
>
> List Comprehension : MIN: 7.32 ms , MAX: 9.13 ms , MEAN: 7.85 ms , BEST OF
> 3: 7.33 ms
> Empty plus Loop    : MIN: 7.99 ms , MAX: 9.57 ms , MEAN: 8.31 ms , BEST OF
> 3: 8.01 ms
>
>
> Now, I know that list comprehensions are renowned for being insanely fast,
> but I though that doing asarray plus transpose would by far defeat their
> advantage, especially since the list comprehension is used to call a
> method, not to do some simple arithmetic inside it...
>
> I guess I am missing something obvious here... oh, and if anyone has
> suggestions about how to improve my crappy code (performance wise), please
> feel free to add your thoughts.
>
> Thank you.
>
> Andrea.
>
>
>
>
>
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20171010/17bde701/attachment.html>


More information about the NumPy-Discussion mailing list