[Numpy-discussion] record arrays initialization

Moroney, Catherine M (388D) Catherine.M.Moroney at jpl.nasa.gov
Wed May 2 17:45:44 EDT 2012


Thanks to Perry for some very useful off-list conversation.   I realize that
I wasn't being clear at all in my earlier description of the problem so here it is
in a nutshell:

Find the best match in an array t(5000, 7) for a single vector e(7).  Now scale
it up so e is (128, 512, 7) and I want to return a (128, 512) array of the t-identifiers
that are the best match for e.  "Best match" is defined as the minimum Euclidean distance.

I'm going to try three ways: (a) brute force and lots of looping in python,
(b) constructing a function to find the match for a single instance of e and
vectorizing it, and (c) coding it in Fortran.  I'll be curious to see the 
performance figures.

Two smaller questions:

A)  How do I most efficiently construct a record array from a single array?
I want to do the following, but it segfaults on me when i try to print b.

vtype = [("x", numpy.ndarray)]
a = numpy.arange(0, 16).reshape(4,4)
b = numpy.recarray((4), dtype=vtype, buf=a)

print a
print b

What is the most efficient way of constructing b from the values of a?  In real-life, 
a is (128*512*7) and I want b to be (128, 512) with the x component being a 7-value numpy array.

and

B)  If I'm vectorizing a function ("single") to find the best match for
a single element of e within t, how do I pass the entire array t into
the function without having it parcelled down to its individual elements?

i.e.

def single(elements, targets):
    nlen = element.shape[0]
    nvec = targets.data.shape[0]
    x = element.reshape(1, nlen).repeat(nvec, axis=0)

    diffs = ((x - targets.data)**2).sum(axis=1)
    diffs = numpy.sqrt(diffs)
    return numpy.argmin(diffs, axis=0)

multiple = numpy.vectorize(single)
x = multiple(all_elements, target)

where all_elements is similar to "b" in my first example, and target
is a 2-d array.  The above code doesn't work because "target" gets reduced
to a single element when it gets down to "single" and I need to see the whole array
when I'm down in "single". 

I found a work-around by encapsulating target into a single object and passing
in the object, but I'm curious if there's a better way of doing this.

I hope I've explained myself better this time around,

Catherine


More information about the NumPy-Discussion mailing list