[Numpy-discussion] record arrays and vectorizing
Richard Hattersley
rhattersley at gmail.com
Thu May 3 04:39:26 EDT 2012
Sounds like it could be a good match for `scipy.spatial.cKDTree`.
It can handle single-element queries...
>>> element = numpy.arange(1, 8)
>>> targets = numpy.random.uniform(0, 8, (1000, 7))
>>> tree = scipy.spatial.cKDTree(targets)
>>> distance, index = tree.query(element)
>>> targets[index]
array([ 1.68457267, 4.26370212, 3.14837617, 4.67616512, 5.80572286,
6.46823904, 6.12957534])
Or even multi-element queries (shown here searching for 3 elements in one
call)...
>>> elements = numpy.linspace(1, 8, 21).reshape((3, 7))
>>> elements
array([[ 1. , 1.35, 1.7 , 2.05, 2.4 , 2.75, 3.1 ],
[ 3.45, 3.8 , 4.15, 4.5 , 4.85, 5.2 , 5.55],
[ 5.9 , 6.25, 6.6 , 6.95, 7.3 , 7.65, 8. ]])
>>> distances, indices = tree.query(element)
>>> targets[indices]
array([[ 0.24314961, 2.77933521, 2.00092505, 3.25180563, 2.05392726,
2.80559459, 4.43030939],
[ 4.19270199, 2.89257994, 3.91366449, 3.29262138, 3.6779851 ,
4.06619636, 4.7183393 ],
[ 6.58055518, 6.59232922, 7.00473346, 5.22612494, 7.07170015,
6.54570121, 7.59566404]])
Richard Hattersley
On 2 May 2012 19:06, Moroney, Catherine M (388D) <
Catherine.M.Moroney at jpl.nasa.gov> wrote:
> Hello,
>
> Can somebody give me some hints as to how to code up this function
> in pure python, rather than dropping down to Fortran?
>
> I will want to compare a 7-element vector (called "element") to a large
> list of similarly-dimensioned
> vectors (called "target", and pick out the vector in "target" that is the
> closest to "element"
> (determined by minimizing the Euclidean distance).
>
> For instance, in (slow) brute force form it would look like:
>
> element = numpy.array([1, 2, 3, 4, 5, 6, 7])
> target = numpy.array(range(0, 49)).reshape(7,7)*0.1
>
> min_length = 9999.0
> min_index =
> for i in xrange(0, 7):
> distance = (element-target)**2
> distance = numpy.sqrt(distance.sum())
> if (distance < min_length):
> min_length = distance
> min_index = i
>
> Now of course, the actual problem will be of a much larger scale. I will
> have
> an array of elements, and a large number of potential targets.
>
> I was thinking of having element be an array where each element itself is
> a numpy.ndarray, and then vectorizing the code above so as an output I
> would
> have an array of the "min_index" and "min_length" values.
>
> I can get the following simple test to work so I may be on the right track:
>
> import numpy
>
> dtype = [("x", numpy.ndarray)]
>
> def single(data):
> return data[0].min()
>
> multiple = numpy.vectorize(single)
>
> if __name__ == "__main__":
>
> a = numpy.arange(0, 16).reshape(4,4)
> b = numpy.recarray((4), dtype=dtype)
> for i in xrange(0, b.shape[0]):
> b[i]["x"] = a[i,:]
>
> print a
> print b
>
> x = multiple(b)
> print x
>
> What is the best way of constructing "b" from "a"? I tried b =
> numpy.recarray((4), dtype=dtype, buf=a)
> but I get a segmentation fault when I try to print b.
>
> Is there a way to perform this larger task efficiently with record arrays
> and vectorization, or
> am I off on the wrong track completely? How can I do this efficiently
> without dropping
> down to Fortran?
>
> Thanks for any advice,
>
> Catherine
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120503/e4d2b318/attachment.html>
More information about the NumPy-Discussion
mailing list