[Numpy-discussion] Indexing empty dimensions with empty arrays

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Dec 28 07:52:39 EST 2011

On 12/28/2011 09:33 AM, Ralf Gommers wrote:
> 2011/12/27 Jordi Gutiérrez Hermoso <jordigh at octave.org
> <mailto:jordigh at octave.org>>
>     On 26 December 2011 14:56, Ralf Gommers <ralf.gommers at googlemail.com
>     <mailto:ralf.gommers at googlemail.com>> wrote:
>      >
>      >
>      > On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com>> wrote:
>      >> I have a hard time thinking through empty 2-dim arrays, and
>     don't know
>      >> what rules should apply.
>      >> However, in my code I might want to catch these cases rather early
>      >> than late and then having to work my way backwards to find out where
>      >> the content disappeared.
>      >
>      >
>      > Same here. Almost always, my empty arrays are either due to bugs
>     or they
>      > signal that I do need to special-case something. Silent passing
>     through of
>      > empty arrays to all numpy functions is not what I would want.
>     I find it quite annoying to treat the empty set with special
>     deference. "All of my great-grandkids live in Antarctica" should be
>     true for me (I'm only 30 years old). If you decide that is not true
>     for me, it leads to a bunch of other logical annoyances up there
> Guess you don't mean true/false, because it's neither. But I understand
> you want an empty array back instead of an error.
> Currently the problem is that when you do get that empty array back,
> you'll then use that for something else and it will probably still
> crash. Many numpy functions do not check for empty input and will still
> give exceptions. My impression is that you're better off handling these
> where you create the empty array, rather than in some random place later
> on. The alternative is to have consistent rules for empty arrays, and
> handle them explicitly in all functions. Can be done, but is of course a
> lot of work and has some overhead.

Are you saying that the existence of other bugs means that this bug 
shouldn't be fixed? I just fail to see the relevance of these other bugs 
to this discussion.

For the record, I've encountered this bug many times myself and it's 
rather irritating, since it leads to more verbose code.

It is useful whenever you want to return data that is a subset of the 
input data (since the selected subset can usually be zero-sized 
sometimes -- remember, in computer science the only numbers are 0, 1, 
and "any number").

Here's one of the examples I've had. The Interpolative Decomposition 
decomposes a m-by-n matrix A of rank k as

A = B C

where B is an m-by-k matrix consisting of a subset of the columns of A, 
and C is a k-by-n matrix.

Now, if A is all zeros (which is often the case for me), then k is 0. I 
would still like to create the m-by-0 matrix B by doing

B = A[:, selected_columns]

But now I have to do this instead:

if len(selected_columns) == 0:
     B = np.zeros((A.shape[0], 0), dtype=A.dtype)
     B = A[:, selected_columns]

In this case, zero-sized B and C are of course perfectly valid and 
useful results:

In [2]: np.dot(np.ones((3,0)), np.ones((0, 5)))
array([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])

Dag Sverre

More information about the NumPy-Discussion mailing list