[Numpy-discussion] creation of ndarray with dtype=np.object : bug?
Ryan Nelson
rnelsonchem at gmail.com
Tue Dec 2 22:32:13 EST 2014
Emanuele Olivetti <emanuele <at> relativita.com> writes:
>
> Hi,
>
> I am using 2D arrays where only one dimension remains constant, e.g.:
> ---
> import numpy as np
> a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
> b = np.array([[9, 8, 7]]) # 1 x 3
> c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
> d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3
> ---
> I have a large number of them and need to extract subsets of them
> through fancy indexing and then stack them together. For this reason
> I put them into an array of dtype=np.object, given their non-constant
> nature. Indexing works well :) but stacking does not :( , as you can
> see in the following example:
> ---
> # fancy indexing :)
> data = np.array([a, b, c, d], dtype=np.object)
> idx = [0, 1, 3]
> print(data[idx])
> In [1]:
> [[[1 2 3]
> [4 5 6]] [[9 8 7]] [[5 5 4]
> [4 3 3]]]
>
> # stacking :(
> data2 = np.array([a, b, c], dtype=np.object)
> data3 = np.array([a, d], dtype=np.object)
> together = np.vstack([data2, data3])
> In [2]:
> ----------------------------------------------------------------------
-----
> ValueError Traceback (most recent call
last)
> <ipython-input-14-7ebee5709e29> in <module>()
> ----> 1 execfile(r'/tmp/python-3276515J.py') # PYTHON-MODE
>
> /tmp/python-3276515J.py in <module>()
> 1 data2 = np.array([a, b, c], dtype=np.object)
> 2 data3 = np.array([a, d], dtype=np.object)
> ----> 3 together = np.vstack([data2, data3])
>
> /usr/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in
vstack(tup)
> 224
> 225 """
> --> 226 return _nx.concatenate(map(atleast_2d,tup),0)
> 227
> 228 def hstack(tup):
>
> ValueError: arrays must have same number of dimensions
> ----
> The reason of the error is that data2.shape is "(2,)", while
data3.shape is "(2,
> 2, 3)".
> This happens because the creation of ndarrays with dtype=np.object
tries to be
> "smart" and infer the common dimensions between the objects you put in
the array
> instead of just creating an array of the objects you give. This leads
to unexpected
> results when you use it, like the one in the example, because you
cannot control
> the resulting shape, which is data dependent. Or at least I cannot
find a way to
> create data3 with shape (2,)...
>
> How should I address this issue? To me, it looks like a bug in the
excellent NumPy.
>
> Best,
>
> Emanuele
>
Emanuele,
This doesn't address your question directly. However, I wonder if you
could approach this problem from a different way to get what you want.
First of all, create a "index" array and then just vstack all of your
arrays at once.
-----
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
b = np.array([[9, 8, 7]]) # 1 x 3
c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3
all_array = [a, b, c, d]
z = []
np.array([z.extend([n,]*i.shape[0]) for n, i in enumerate(all_array)])
z = np.array(z)
varrays = np.vstack(all_array)
----
Now z looks like this `array([0, 0, 1, 2, 2, 2, 3, 3])` and varrays is a
vstack of all your data.
To select one of your arrays, you can do something like the following.
-----
[In]: varrays[ z == 2 ] # Array c
[Out]:
array([[1, 3, 5],
[7, 9, 8],
[6, 4, 2]])
-----
Now, if you want to select both arrays b and d, for example, you would
need a boolean array that looks like this:
array([False, False, True, False, False, False, True, True])
I think there is some Numpy black magic that let's you do this easily
(e.g. `i_wish = z == [1,3]`), but right now, I can only think about how
to do this with a loop:
----
idxs = np.zeros(z.shape, dtype=bool)
for i in [1,3]:
idxs = np.logical_or(idxs, z == i)
idxs
----
This lets you select from the large loop and get the vstacked arrays
automatically.
----
[In]: varrays[idxs]
[Out]:
array([[9, 8, 7],
[5, 5, 4],
[4, 3, 3]])
-----
Sorry if this does not help. Just spit-balling...
Ryan
More information about the NumPy-Discussion
mailing list