[Numpy-discussion] Optimization of loops

Thu Oct 23 18:55:50 EDT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks everybody for all your feedback !!!

Ok, so first, I just must have to precise what was my problem and why I
used Numpy. Because I want to extract only some ids in the lists. In
fact, I've a test in my list implementation:

for id, time in my_list:
    if id in id_list:
        res[id].append(time)

So this test is what slows down the whole stuff. The implementation
proposed with bisect is very nice and has the advantaged of not spending
time allocating numpy arrays, and also to avoid the previous test. The
list where id will not be in id_list will return empty lists, and that's
what I wanted.

import bisect
my_list.sort()
time, ind = zip(*my_list)
for i in id_list :
    beg = bisect.bisect_left(ind,i)
    end = bisect.bisect_right(ind,i)
    mylist.append(tim[beg:end])

But thanks a lot for all you precious comments, you solve a big part of
my problem. I'll see how to use generator and how it speeds up or not
the whole process. I would also be very interested to know what the code
looks like with Cython and what are the results....

Cheers,

Pierre Yger

>>>> spikes = [(0, 2.3),(1, 5.6),(3, 2.5),(0, 5.2),(3, 10.2),(2, 16.2)]
>> mysort(spikes) 
>>
>> should return:
>>
>> [[2.3, 5.2], [5.6], [16.2], [2.5, 10.2]]
>>
>> Intuitively, the simplest way to do that is to append elements while going 
>> through all the tuples of the main list (called spikes) to empty lists:
>>
>> res = [[] for i in xrange(N)]
>>
>> for id, time in my_list:
>>         res[id].append(time)
>>
>> But this loop seems to be incredibly slow for large lists ! 
>>
>> A faster way (after having performed some profiling) seems to do:
>> spikes = numpy.array(spikes) # Convert the list into a numpy array
>> res = []
>> for id in xrange(N):
>>         res.append(spikes[spikes[:,0] == id, 1]) # Use Numpy indexes
>>
>> Nevertheless, this is still rather slow. Does anybody have any idea about a 
>> faster way to do this ? Is there a Numpy function that could be used ?
> 
> I'm surprised you find using numpy indexing (your second method) is
> faster. In my simple test (N=10, 100000 points in the list), your method
> 1 (using python lists) took 0.15s whereas your method 2 (numpy indexing)
> took 1.8s. Big difference. 
> 
> I wonder how you are profiling this. I expect the numpy indexing to be
> quite fast (and you only do it N times) but the overhead of copying the
> entire list into an array is probably the bottleneck. If you need to
> iterate over the final output arrays, there's a further penalty in using
> a numpy array as iteration is not as fast as for a python list. 
> 
> There might be some benefit in using numpy arrays if you can create your
> input directly as an array without creating the list-of-tuples first.
> 
> If you really need more speed you could try Cython (a means of compiling
> python to C-code, while adding type-declarations to permit
> optimisation). There may be a worthwhile improvement running the
> python-list method through Cython (I'll try this tonight to satisfy my
> own curiosity). If you can arrange for the inputs and outputs to be
> numpy arrays, then you can do the iteration over the data and copy into
> output array using pure C (but coded in Cython; much easier than C).
> This will be as fast as it's possible to go. You need one pass to figure
> out how big the output arrays need to be then a second to copy the data.
> 
> Finally, an even better approach may be to avoid creating such a large
> data-structure in the first place. If you can replace your list with a
> generator, you save time by avoiding the need to allocate the memory to
> hold a large list. Similarly, if you don't need to store the output
> lists for repeated use, then outputing a list of generators may be more
> efficient. Whether this is viable depends on the context.
> 
> HTH
> BC
> 
>> Thanks in advance, 
>>
>> Pierre
> 
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkkBAPUACgkQTyWWLSJwVgLX1wCfYtO89V5nicsevkMm1CWShP/j
QqsAnRdqqkqsoy7rwfT2dQWCgj/Jx+/X
=qDBo
-----END PGP SIGNATURE-----