[Numpy-discussion] creation of ndarray with dtype=np.object : bug?

Emanuele Olivetti emanuele at relativita.com
Tue Dec 2 06:53:04 EST 2014


Hi,

I am using 2D arrays where only one dimension remains constant, e.g.:
---
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
b = np.array([[9, 8, 7]]) # 1 x 3
c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3
---
I have a large number of them and need to extract subsets of them
through fancy indexing and then stack them together. For this reason
I put them into an array of dtype=np.object, given their non-constant
nature. Indexing works well :) but stacking does not :( , as you can
see in the following example:
---
# fancy indexing :)
data = np.array([a, b, c, d], dtype=np.object)
idx = [0, 1, 3]
print(data[idx])
In [1]:
[[[1 2 3]
  [4 5 6]] [[9 8 7]] [[5 5 4]
  [4 3 3]]]

# stacking :(
data2 = np.array([a, b, c], dtype=np.object)
data3 = np.array([a, d], dtype=np.object)
together = np.vstack([data2, data3])
In [2]:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-7ebee5709e29> in <module>()
----> 1 execfile(r'/tmp/python-3276515J.py') # PYTHON-MODE

/tmp/python-3276515J.py in <module>()
       1 data2 = np.array([a, b, c], dtype=np.object)
       2 data3 = np.array([a, d], dtype=np.object)
----> 3 together = np.vstack([data2, data3])

/usr/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in vstack(tup)
     224
     225     """
--> 226     return _nx.concatenate(map(atleast_2d,tup),0)
     227
     228 def hstack(tup):

ValueError: arrays must have same number of dimensions
----
The reason of the error is that data2.shape is "(2,)", while data3.shape is "(2, 
2, 3)".
This happens because the creation of ndarrays with dtype=np.object tries to be
"smart" and infer the common dimensions between the objects you put in the array
instead of just creating an array of the objects you give. This leads to unexpected
results when you use it, like the one in the example, because you cannot control
the resulting shape, which is data dependent. Or at least I cannot find a way to
create data3 with shape (2,)...

How should I address this issue? To me, it looks like a bug in the excellent NumPy.

Best,

Emanuele







More information about the NumPy-Discussion mailing list