Less dimensions than expected with record array

Hello all, This question may seem elementary (mostly because it is), but I can't find documentation anywhere as to why the following are true:
import numpy as np data = [(1,2,3),(4,5,6),(7,8,9)] dt = [('a',int),('b',int),('c',int)] normal_array = np.array(data) record_array = np.array(data, dtype=dt) print "ndarray has shape %s but record array has shape %s" % \ ... (normal_array.shape, record_array.shape) ndarray has shape (3, 3) but record array has shape (3,) print "ndarray has %s dimensions but record array has %s dimensions" % \ ... (normal_array.ndim, record_array.ndim) ndarray has 2 dimensions but record array has 1 dimensions
np.apply_along_axis(record_array, 1, lambda x: x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/dist-packages/numpy/lib/shape_base.py",
This makes seemingly reasonable things, like using apply_along_axis() over a table of data with named columns, impossible: line 72, in apply_along_axis % (axis,nd)) ValueError: axis must be less than arr.ndim; axis=1, rank=0. What's the reason for this behavior? Is there a way to make such operations work with record arrays? Thanks, Alan

On Fri, Apr 29, 2011 at 10:56 PM, Alan Gibson <dyssident@gmail.com> wrote:
Hello all,
This question may seem elementary (mostly because it is), but I can't find documentation anywhere as to why the following are true:
import numpy as np data = [(1,2,3),(4,5,6),(7,8,9)] dt = [('a',int),('b',int),('c',int)] normal_array = np.array(data) record_array = np.array(data, dtype=dt) print "ndarray has shape %s but record array has shape %s" % \ ... (normal_array.shape, record_array.shape) ndarray has shape (3, 3) but record array has shape (3,) print "ndarray has %s dimensions but record array has %s dimensions" % \ ... (normal_array.ndim, record_array.ndim) ndarray has 2 dimensions but record array has 1 dimensions
This makes seemingly reasonable things, like using apply_along_axis() over a table of data with named columns, impossible:
np.apply_along_axis(record_array, 1, lambda x: x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/dist-packages/numpy/lib/shape_base.py", line 72, in apply_along_axis % (axis,nd)) ValueError: axis must be less than arr.ndim; axis=1, rank=0.
What's the reason for this behavior? Is there a way to make such operations work with record arrays?
each row (record) is treated as one array element, so the structured array is only 1d. If you have rows/records with content that is not homogenous, then working along axis=1 (across elements of a record) doesn't make sense. for example I just struggle with 2 datetime columns and the rest are integers. If you want an array with homogenous elements (all floats or all ints) with operations along axis, then larry (la) is, I think, still the best bet. I don't know what the status with the dataarray for numpy is. Josef
Thanks,
Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

ndarray has 2 dimensions but record array has 1 dimensions
This makes seemingly reasonable things, like using apply_along_axis() over a table of data with named columns, impossible:
each row (record) is treated as one array element, so the structured array is only 1d.
If you have rows/records with content that is not homogenous, then working along axis=1 (across elements of a record) doesn't make sense. for example I just struggle with 2 datetime columns and the rest are integers.
If you want an array with homogenous elements (all floats or all ints) with operations along axis, then larry (la) is, I think, still the best bet.
another option is to use views. There are time when I want the same array visible as both a structured array, and a regular old array, depending on what I'm doing with it. and you can do that: In [77]: data Out[77]: [(1, 2, 3), (4, 5, 6), (7, 8, 9)] In [80]: dt = [('a',int),('b',int),('c',int)] In [81]: record_array = np.array(data, dtype=dt) In [84]: array = record_array.view(dtype=np.int).reshape(-1,3) In [85]: array Out[85]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # array and record_array share the same data: In [88]: array[:,1] *= 2 In [89]: array Out[89]: array([[ 1, 4, 3], [ 4, 10, 6], [ 7, 16, 9]]) In [90]: record_array Out[90]: array([(1, 4, 3), (4, 10, 6), (7, 16, 9)], dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')]) views are one of the really cool things about numpy. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (3)
-
Alan Gibson
-
Christopher Barker
-
josef.pktd@gmail.com