[Numpy-discussion] creation of ndarray with dtype=np.object : bug?

Wed Dec 3 07:02:22 EST 2014

On 12/03/2014 12:17 PM, Jaime Fernández del Río wrote:
>
>
> The safe way to create 1D object arrays from a list is by preallocating them, 
> something like this:
>
> >>> a = [np.random.rand(2, 3), np.random.rand(2, 3)]
> >>> b = np.empty(len(a), dtype=object)
> >>> b[:] = a
> >>> b
> array([ array([[ 0.124382  ,  0.04489531,  0.93864908],
>        [ 0.77204758,  0.63094413,  0.55823578]]),
>        array([[ 0.80151723,  0.33147467,  0.40491018],
>        [ 0.09905844,  0.90254708,  0.69911945]])], dtype=object)
>
>

Thank you for the compact way to create 1D object arrays. Definitely
useful!

>
> As to why np.array tries to be smart, keep in mind that there are other 
> applications of object arrays than having stacked sequences. The following 
> code computes the 100-th Fibonacci number using the matrix form of the 
> recursion (http://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form), numpy's 
> linear algebra capabilities, and Python's arbitrary precision ints:
>
> >>> a = np.array([[0, 1], [1, 1]], dtype=object)
> >>> np.linalg.matrix_power(a, 99)[0, 0]
> 135301852344706746049L
>
> Trying to do this with any other type would result in either wrong results due 
> to overflow:
>
> [...]

I guess that the problem I am referring to does not refer only to stacked
sequences and it is more general.

Moreover I do agree that on the example you present: the array creation
explores the list of lists and create a 2D array of Python int instead
of np.int64. Exploring iterable containers is certainly correct in general. I
am wondering whether it should be prevented in some cases, where the
semantic is clear from the syntax, e.g. when the nature of the container
changes (see below).

To me this is intuitive and correct:
 >>> a = np.array([[0, 1], [1, 1]], dtype=object)
 >>> a.shape
(2, 2)
while this is counterintuitive and potentially error-prone:
 >>> b = np.array([np.array([0, 1]), np.array([0, 1])], dtype=object)
 >>> b.shape
(2, 2)
because it is clear that I meant a list of two vectors, i.e. an array of
shape (2,), and not a 2D array of shape (2, 2).

Best,

Emanuele