[Numpy-discussion] record arrays and vectorizing

Wed May 2 14:06:52 EDT 2012

Hello,

Can somebody give me some hints as to how to code up this function
in pure python, rather than dropping down to Fortran?

I will want to compare a 7-element vector (called "element") to a large list of similarly-dimensioned
vectors (called "target", and pick out the vector in "target" that is the closest to "element"
(determined by minimizing the Euclidean distance).  

For instance, in (slow) brute force form it would look like:

element = numpy.array([1, 2, 3, 4, 5, 6, 7])
target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

min_length = 9999.0
min_index  = 
for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance < min_length):
      min_length = distance
      min_index  = i

Now of course, the actual problem will be of a much larger scale.  I will have
an array of elements, and a large number of potential targets.  

I was thinking of having element be an array where each element itself is
a numpy.ndarray, and then vectorizing the code above so as an output I would
have an array of the "min_index" and "min_length" values.  

I can get the following simple test to work so I may be on the right track:

import numpy

dtype = [("x", numpy.ndarray)]

def single(data):
    return data[0].min()

multiple = numpy.vectorize(single)

if __name__ == "__main__":

    a = numpy.arange(0, 16).reshape(4,4)
    b = numpy.recarray((4), dtype=dtype)
    for i in xrange(0, b.shape[0]):
        b[i]["x"] = a[i,:]

    print a
    print b

    x = multiple(b)
    print x

What is the best way of constructing "b" from "a"?  I tried b = numpy.recarray((4), dtype=dtype, buf=a)
but I get a segmentation fault when I try to print b.

Is there a way to perform this larger task efficiently with record arrays and vectorization, or
am I off on the wrong track completely?  How can I do this efficiently without dropping
down to Fortran?

Thanks for any advice,

Catherine