[Numpy-discussion] creation of ndarray with dtype=np.object : bug?

Ryan Nelson rnelsonchem at gmail.com
Tue Dec 2 22:32:13 EST 2014


Emanuele Olivetti <emanuele <at> relativita.com> writes:

> 
> Hi,
> 
> I am using 2D arrays where only one dimension remains constant, e.g.:
> ---
> import numpy as np
> a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
> b = np.array([[9, 8, 7]]) # 1 x 3
> c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
> d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3
> ---
> I have a large number of them and need to extract subsets of them
> through fancy indexing and then stack them together. For this reason
> I put them into an array of dtype=np.object, given their non-constant
> nature. Indexing works well :) but stacking does not :( , as you can
> see in the following example:
> ---
> # fancy indexing :)
> data = np.array([a, b, c, d], dtype=np.object)
> idx = [0, 1, 3]
> print(data[idx])
> In [1]:
> [[[1 2 3]
>   [4 5 6]] [[9 8 7]] [[5 5 4]
>   [4 3 3]]]
> 
> # stacking :(
> data2 = np.array([a, b, c], dtype=np.object)
> data3 = np.array([a, d], dtype=np.object)
> together = np.vstack([data2, data3])
> In [2]:
> ----------------------------------------------------------------------
-----
> ValueError                                Traceback (most recent call 
last)
> <ipython-input-14-7ebee5709e29> in <module>()
> ----> 1 execfile(r'/tmp/python-3276515J.py') # PYTHON-MODE
> 
> /tmp/python-3276515J.py in <module>()
>        1 data2 = np.array([a, b, c], dtype=np.object)
>        2 data3 = np.array([a, d], dtype=np.object)
> ----> 3 together = np.vstack([data2, data3])
> 
> /usr/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in 
vstack(tup)
>      224
>      225     """
> --> 226     return _nx.concatenate(map(atleast_2d,tup),0)
>      227
>      228 def hstack(tup):
> 
> ValueError: arrays must have same number of dimensions
> ----
> The reason of the error is that data2.shape is "(2,)", while 
data3.shape is "(2, 
> 2, 3)".
> This happens because the creation of ndarrays with dtype=np.object 
tries to be
> "smart" and infer the common dimensions between the objects you put in 
the array
> instead of just creating an array of the objects you give. This leads 
to unexpected
> results when you use it, like the one in the example, because you 
cannot control
> the resulting shape, which is data dependent. Or at least I cannot 
find a way to
> create data3 with shape (2,)...
> 
> How should I address this issue? To me, it looks like a bug in the 
excellent NumPy.
> 
> Best,
> 
> Emanuele
> 

Emanuele,

This doesn't address your question directly. However, I wonder if you 
could approach this problem from a different way to get what you want.

First of all, create a "index" array and then just vstack all of your 
arrays at once.

-----
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
b = np.array([[9, 8, 7]]) # 1 x 3
c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3

all_array = [a, b, c, d]

z = []
np.array([z.extend([n,]*i.shape[0]) for n, i in enumerate(all_array)])
z = np.array(z)

varrays = np.vstack(all_array)
----

Now z looks like this `array([0, 0, 1, 2, 2, 2, 3, 3])` and varrays is a 
vstack of all your data.

To select one of your arrays, you can do something like the following.

-----

[In]: varrays[ z == 2 ] # Array c

[Out]:
array([[1, 3, 5],
       [7, 9, 8],
       [6, 4, 2]])
-----
Now, if you want to select both arrays b and d, for example, you would 
need a boolean array that looks like this:
array([False, False, True, False, False, False, True, True])
I think there is some Numpy black magic that let's you do this easily 
(e.g. `i_wish = z == [1,3]`), but right now, I can only think about how 
to do this with a loop:

----
idxs = np.zeros(z.shape, dtype=bool)
for i in [1,3]:
    idxs = np.logical_or(idxs, z == i)
idxs

----

This lets you select from the large loop and get the vstacked arrays 
automatically.

----
[In]: varrays[idxs]
[Out]:
array([[9, 8, 7],
       [5, 5, 4],
       [4, 3, 3]])
-----

Sorry if this does not help. Just spit-balling...
Ryan





More information about the NumPy-Discussion mailing list