[Numpy-discussion] Indexing empty dimensions with empty arrays

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Dec 28 07:57:55 EST 2011

On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
> On 12/28/2011 09:33 AM, Ralf Gommers wrote:
>> 2011/12/27 Jordi Gutiérrez Hermoso<jordigh at octave.org
>> <mailto:jordigh at octave.org>>
>>      On 26 December 2011 14:56, Ralf Gommers<ralf.gommers at googlemail.com
>>      <mailto:ralf.gommers at googlemail.com>>  wrote:
>>       >
>>       >
>>       >  On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd at gmail.com
>>      <mailto:josef.pktd at gmail.com>>  wrote:
>>       >>  I have a hard time thinking through empty 2-dim arrays, and
>>      don't know
>>       >>  what rules should apply.
>>       >>  However, in my code I might want to catch these cases rather early
>>       >>  than late and then having to work my way backwards to find out where
>>       >>  the content disappeared.
>>       >
>>       >
>>       >  Same here. Almost always, my empty arrays are either due to bugs
>>      or they
>>       >  signal that I do need to special-case something. Silent passing
>>      through of
>>       >  empty arrays to all numpy functions is not what I would want.
>>      I find it quite annoying to treat the empty set with special
>>      deference. "All of my great-grandkids live in Antarctica" should be
>>      true for me (I'm only 30 years old). If you decide that is not true
>>      for me, it leads to a bunch of other logical annoyances up there
>> Guess you don't mean true/false, because it's neither. But I understand
>> you want an empty array back instead of an error.
>> Currently the problem is that when you do get that empty array back,
>> you'll then use that for something else and it will probably still
>> crash. Many numpy functions do not check for empty input and will still
>> give exceptions. My impression is that you're better off handling these
>> where you create the empty array, rather than in some random place later
>> on. The alternative is to have consistent rules for empty arrays, and
>> handle them explicitly in all functions. Can be done, but is of course a
>> lot of work and has some overhead.
> Are you saying that the existence of other bugs means that this bug
> shouldn't be fixed? I just fail to see the relevance of these other bugs
> to this discussion.
> For the record, I've encountered this bug many times myself and it's
> rather irritating, since it leads to more verbose code.
> It is useful whenever you want to return data that is a subset of the
> input data (since the selected subset can usually be zero-sized
> sometimes -- remember, in computer science the only numbers are 0, 1,
> and "any number").
> Here's one of the examples I've had. The Interpolative Decomposition
> decomposes a m-by-n matrix A of rank k as
> A = B C
> where B is an m-by-k matrix consisting of a subset of the columns of A,
> and C is a k-by-n matrix.
> Now, if A is all zeros (which is often the case for me), then k is 0. I
> would still like to create the m-by-0 matrix B by doing
> B = A[:, selected_columns]
> But now I have to do this instead:
> if len(selected_columns) == 0:
>       B = np.zeros((A.shape[0], 0), dtype=A.dtype)
> else:
>       B = A[:, selected_columns]
> In this case, zero-sized B and C are of course perfectly valid and
> useful results:
> In [2]: np.dot(np.ones((3,0)), np.ones((0, 5)))
> Out[2]:
> array([[ 0.,  0.,  0.,  0.,  0.],
>          [ 0.,  0.,  0.,  0.,  0.],
>          [ 0.,  0.,  0.,  0.,  0.]])

And to answer the obvious question: Yes, this is a real usecase. It is 
used for something similar to image compression, where sub-sections of 
the images may well be all-zero and have zero rank (full story at [1]).

Reading the above thread I understand Ralf's reasoning better, but 
really, relying on NumPy's buggy behaviour to discover bugs in user code 
seems like the wrong approach. Tools should be dumb unless there are 
good reasons to make them smart. I'd be rather irritated about my hammer 
if it refused to drive in nails that it decided where in the wrong spot.

Dag Sverre

[1] http://arxiv.org/abs/1110.4874

More information about the NumPy-Discussion mailing list