Less dimensions than expected with record array
Hello all, This question may seem elementary (mostly because it is), but I can't find documentation anywhere as to why the following are true:
import numpy as np data = [(1,2,3),(4,5,6),(7,8,9)] dt = [('a',int),('b',int),('c',int)] normal_array = np.array(data) record_array = np.array(data, dtype=dt) print "ndarray has shape %s but record array has shape %s" % \ ... (normal_array.shape, record_array.shape) ndarray has shape (3, 3) but record array has shape (3,) print "ndarray has %s dimensions but record array has %s dimensions" % \ ... (normal_array.ndim, record_array.ndim) ndarray has 2 dimensions but record array has 1 dimensions
np.apply_along_axis(record_array, 1, lambda x: x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/dist-packages/numpy/lib/shape_base.py",
This makes seemingly reasonable things, like using apply_along_axis() over a table of data with named columns, impossible: line 72, in apply_along_axis % (axis,nd)) ValueError: axis must be less than arr.ndim; axis=1, rank=0. What's the reason for this behavior? Is there a way to make such operations work with record arrays? Thanks, Alan
On Fri, Apr 29, 2011 at 10:56 PM, Alan Gibson
Hello all,
This question may seem elementary (mostly because it is), but I can't find documentation anywhere as to why the following are true:
import numpy as np data = [(1,2,3),(4,5,6),(7,8,9)] dt = [('a',int),('b',int),('c',int)] normal_array = np.array(data) record_array = np.array(data, dtype=dt) print "ndarray has shape %s but record array has shape %s" % \ ... (normal_array.shape, record_array.shape) ndarray has shape (3, 3) but record array has shape (3,) print "ndarray has %s dimensions but record array has %s dimensions" % \ ... (normal_array.ndim, record_array.ndim) ndarray has 2 dimensions but record array has 1 dimensions
This makes seemingly reasonable things, like using apply_along_axis() over a table of data with named columns, impossible:
np.apply_along_axis(record_array, 1, lambda x: x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/dist-packages/numpy/lib/shape_base.py", line 72, in apply_along_axis % (axis,nd)) ValueError: axis must be less than arr.ndim; axis=1, rank=0.
What's the reason for this behavior? Is there a way to make such operations work with record arrays?
each row (record) is treated as one array element, so the structured array is only 1d. If you have rows/records with content that is not homogenous, then working along axis=1 (across elements of a record) doesn't make sense. for example I just struggle with 2 datetime columns and the rest are integers. If you want an array with homogenous elements (all floats or all ints) with operations along axis, then larry (la) is, I think, still the best bet. I don't know what the status with the dataarray for numpy is. Josef
Thanks,
Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
ndarray has 2 dimensions but record array has 1 dimensions
This makes seemingly reasonable things, like using apply_along_axis() over a table of data with named columns, impossible:
each row (record) is treated as one array element, so the structured array is only 1d.
If you have rows/records with content that is not homogenous, then working along axis=1 (across elements of a record) doesn't make sense. for example I just struggle with 2 datetime columns and the rest are integers.
If you want an array with homogenous elements (all floats or all ints) with operations along axis, then larry (la) is, I think, still the best bet.
another option is to use views. There are time when I want the same
array visible as both a structured array, and a regular old array,
depending on what I'm doing with it. and you can do that:
In [77]: data
Out[77]: [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
In [80]: dt = [('a',int),('b',int),('c',int)]
In [81]: record_array = np.array(data, dtype=dt)
In [84]: array = record_array.view(dtype=np.int).reshape(-1,3)
In [85]: array
Out[85]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# array and record_array share the same data:
In [88]: array[:,1] *= 2
In [89]: array
Out[89]:
array([[ 1, 4, 3],
[ 4, 10, 6],
[ 7, 16, 9]])
In [90]: record_array
Out[90]:
array([(1, 4, 3), (4, 10, 6), (7, 16, 9)],
dtype=[('a', '
participants (3)
-
Alan Gibson
-
Christopher Barker
-
josef.pktd@gmail.com