Thank you so much for the suggestion, Paulo! Selecting 2D points in a list
by creating an array 'mask' of booleans and then using arr[mask, :] is indeed
really fast compared to using numpy.apply_along_axis(), in my case (simple
"larger than" tests on individual coordinates).
I had not realized that you could do "arr[mask, :]": this works great!
EOL
PS: here are the speed tests I've done on the selection of 2D points from a
list, with the following results:
filter0: 107.2 s
filter1: 0.3 s
filter2: 9.7 s
filter3: 0.6 s
obtained with:
#!/usr/bin/env python
import numpy
def filter0(points):
"""
Returns only those points that match the filter.
"""
def filter(p):
return (p[0] > 0.5) and (p[1] < 0.5)
return points[numpy.apply_along_axis(filter, axis = 1, arr = points)]
def filter1(points):
"""
Returns only those points that match the filter.
"""
mask = (points[:, 0] > 0.5) & (points[:, 1] < 0.5)
return points[mask, :]
def filter2(points):
"""
Returns only those points that match the filter.
"""
return numpy.array([p for p in points if ((p[0] > 0.5) and p[1] < 0.5)])
def filter3(points):
"""
Returns only those points that match the filter.
"""
mask = (points[:, 0] > 0.5)
points = points[mask, :]
mask = points[:, 1] < 0.5
return points[mask, :]
if __name__ == '__main__':
import timeit
# We generate many random points:
NUM_PTS = 1000000
points = numpy.random.random((NUM_PTS, 2))
# We make sure that all the filters give the same result:
#print "Initial points:"
#print points
#print "Filtered points:"
#print filter0(points)
#print filter1(points)
#print filter2(points)
#print filter3(points)
for filter_num in range(4):
func_name = "filter%d" % filter_num
t = timeit.Timer("%s(points)" % func_name,
"from __main__ import %s, points" % func_name)
print "%s: %.1f s" % (func_name, t.timeit(number = 3))
> Date: Mon, 12 Jan 2009 11:33:08 -0300
> From: "Paulo J. S. Silva" <pjssilva(a)ime.usp.br>
> Subject: Re: [Numpy-discussion] Fast function application on list of
> 2D points?
> To: Discussion of Numerical Python <numpy-discussion(a)scipy.org>
> Message-ID: <1231770788.6170.3.camel@trinity>
> Content-Type: text/plain; charset="UTF-8"
>
> Why you don't create a mask to select only the points in array that
> satisfies the condition on x and y coordinates. For example the code
> below applies filter only to the values that have x coordinate bigger
> than 0.7 and y coordinate smaller than 0.3:
>
> mask = numpy.logical_and(points[:,0] > 0.7, points[:,1] < 0.3)
> points = numpy.apply_along_axis(filter, axis = 1, arr = points[mask,:])
>
> best,
>
> Paulo
>
> Em Seg, 2009-01-12 ?s 15:21 +0100, Eric LEBIGOT escreveu:
>> Hello,
>>
>> What is the fastest way of applying a function on a list of 2D points? More
>> specifically, I have a list of 2D points, and some do not meet some criteria
>> and must be rejected. Even more specifically, the filter only lets through
>> points whose x coordinate satisfies some condition, _and_ whose y coordinates
>> satisfies another condition (maybe is there room for optimization, here?).
>>
>> Currently, I use
>>
>> points = numpy.apply_along_axis(filter, axis = 1, arr = points)
>>
>> but this creates a bottleneck in my program (array arr may contains 1 million
>> points, for instance).
>>
>> Is there anything that could be faster?
>>
>> Any suggestion would be much appreciated!
>>
>> EOL