[Numpy-discussion] Optimization of loops

Wed Oct 22 12:21:48 EDT 2008

Hi all, 

This is my first mail to the mailing list, and I would like to know if anybody 
has a great idea about the use or not of Numpy and loops in Python.

So here is my problem : I've a large list of tuple (id, time), 
id being integer between [0, ..., N] and time float values.
I want to have a mysort() function that will be able to explode this list into 
N lists of differents sizes, that will contained the times associated to each 
id.

Example:

>> spikes = [(0, 2.3),(1, 5.6),(3, 2.5),(0, 5.2),(3, 10.2),(2, 16.2)]

mysort(spikes) 

should return:

[[2.3, 5.2], [5.6], [16.2], [2.5, 10.2]]

Intuitively, the simplest way to do that is to append elements while going 
through all the tuples of the main list (called spikes) to empty lists:

res = [[] for i in xrange(N)]

for id, time in my_list:
        res[id].append(time)

But this loop seems to be incredibly slow for large lists ! 

A faster way (after having performed some profiling) seems to do:
spikes = numpy.array(spikes) # Convert the list into a numpy array
res = []
for id in xrange(N):
        res.append(spikes[spikes[:,0] == id, 1]) # Use Numpy indexes

Nevertheless, this is still rather slow. Does anybody have any idea about a 
faster way to do this ? Is there a Numpy function that could be used ?

Thanks in advance, 

Pierre