[Numpy-discussion] record arrays initialization

Stéfan van der Walt stefan at sun.ac.za
Wed May 2 18:24:03 EDT 2012


On Wed, May 2, 2012 at 2:45 PM, Moroney, Catherine M (388D)
<Catherine.M.Moroney at jpl.nasa.gov> wrote:
> Find the best match in an array t(5000, 7) for a single vector e(7).  Now scale
> it up so e is (128, 512, 7) and I want to return a (128, 512) array of the t-identifiers
> that are the best match for e.  "Best match" is defined as the minimum Euclidean distance.
>
> I'm going to try three ways: (a) brute force and lots of looping in python,
> (b) constructing a function to find the match for a single instance of e and
> vectorizing it, and (c) coding it in Fortran.  I'll be curious to see the
> performance figures.

I'd use a mixture of (a) and (b):  break the t(N, 7) up into blocks
of, say, (1000, 7), compute the best match in each using broadcasting,
and then combine your results to find the best of the best.  This
strategy should be best for very large N.  For moderate N, where
broadcasting easily fits into memory, the answer given by the OP to
your original email would do the trick.

> A)  How do I most efficiently construct a record array from a single array?
> I want to do the following, but it segfaults on me when i try to print b.
>
> vtype = [("x", numpy.ndarray)]
> a = numpy.arange(0, 16).reshape(4,4)
> b = numpy.recarray((4), dtype=vtype, buf=a)

I prefer not to use record arrays, and stick to structured arrays:

In [11]: vtype = np.dtype([('x', (np.float, 4))])

In [12]: a = np.arange(16.).reshape((4,4))

In [13]: a.view(vtype)
Out[13]:
array([[([0.0, 1.0, 2.0, 3.0],)],
       [([4.0, 5.0, 6.0, 7.0],)],
       [([8.0, 9.0, 10.0, 11.0],)],
       [([12.0, 13.0, 14.0, 15.0],)]],
      dtype=[('x', '<f8', (4,))])

> B)  If I'm vectorizing a function ("single") to find the best match for
> a single element of e within t, how do I pass the entire array t into
> the function without having it parcelled down to its individual elements?

I think the new dtype just makes your life more difficult here.  Simply do:

In [49]: np.sum(a - elements.T, axis=1)
Out[49]: array([  0.,  16.,  32.,  48.])

Stéfan



More information about the NumPy-Discussion mailing list