[Numpy-discussion] record arrays and vectorizing

Richard Hattersley rhattersley at gmail.com
Thu May 3 04:39:26 EDT 2012


Sounds like it could be a good match for `scipy.spatial.cKDTree`.

It can handle single-element queries...

>>> element = numpy.arange(1, 8)
>>> targets = numpy.random.uniform(0, 8, (1000, 7))
>>> tree = scipy.spatial.cKDTree(targets)
>>> distance, index = tree.query(element)
>>> targets[index]
array([ 1.68457267,  4.26370212,  3.14837617,  4.67616512,  5.80572286,
        6.46823904,  6.12957534])

Or even multi-element queries (shown here searching for 3 elements in one
call)...

>>> elements = numpy.linspace(1, 8, 21).reshape((3, 7))
>>> elements
array([[ 1.  ,  1.35,  1.7 ,  2.05,  2.4 ,  2.75,  3.1 ],
       [ 3.45,  3.8 ,  4.15,  4.5 ,  4.85,  5.2 ,  5.55],
       [ 5.9 ,  6.25,  6.6 ,  6.95,  7.3 ,  7.65,  8.  ]])
>>> distances, indices = tree.query(element)
>>> targets[indices]
array([[ 0.24314961,  2.77933521,  2.00092505,  3.25180563,  2.05392726,
         2.80559459,  4.43030939],
       [ 4.19270199,  2.89257994,  3.91366449,  3.29262138,  3.6779851 ,
         4.06619636,  4.7183393 ],
       [ 6.58055518,  6.59232922,  7.00473346,  5.22612494,  7.07170015,
         6.54570121,  7.59566404]])

Richard Hattersley


On 2 May 2012 19:06, Moroney, Catherine M (388D) <
Catherine.M.Moroney at jpl.nasa.gov> wrote:

> Hello,
>
> Can somebody give me some hints as to how to code up this function
> in pure python, rather than dropping down to Fortran?
>
> I will want to compare a 7-element vector (called "element") to a large
> list of similarly-dimensioned
> vectors (called "target", and pick out the vector in "target" that is the
> closest to "element"
> (determined by minimizing the Euclidean distance).
>
> For instance, in (slow) brute force form it would look like:
>
> element = numpy.array([1, 2, 3, 4, 5, 6, 7])
> target  = numpy.array(range(0, 49)).reshape(7,7)*0.1
>
> min_length = 9999.0
> min_index  =
> for i in xrange(0, 7):
>   distance = (element-target)**2
>   distance = numpy.sqrt(distance.sum())
>   if (distance < min_length):
>      min_length = distance
>      min_index  = i
>
> Now of course, the actual problem will be of a much larger scale.  I will
> have
> an array of elements, and a large number of potential targets.
>
> I was thinking of having element be an array where each element itself is
> a numpy.ndarray, and then vectorizing the code above so as an output I
> would
> have an array of the "min_index" and "min_length" values.
>
> I can get the following simple test to work so I may be on the right track:
>
> import numpy
>
> dtype = [("x", numpy.ndarray)]
>
> def single(data):
>    return data[0].min()
>
> multiple = numpy.vectorize(single)
>
> if __name__ == "__main__":
>
>    a = numpy.arange(0, 16).reshape(4,4)
>    b = numpy.recarray((4), dtype=dtype)
>    for i in xrange(0, b.shape[0]):
>        b[i]["x"] = a[i,:]
>
>    print a
>    print b
>
>    x = multiple(b)
>    print x
>
> What is the best way of constructing "b" from "a"?  I tried b =
> numpy.recarray((4), dtype=dtype, buf=a)
> but I get a segmentation fault when I try to print b.
>
> Is there a way to perform this larger task efficiently with record arrays
> and vectorization, or
> am I off on the wrong track completely?  How can I do this efficiently
> without dropping
> down to Fortran?
>
> Thanks for any advice,
>
> Catherine
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120503/e4d2b318/attachment.html>


More information about the NumPy-Discussion mailing list