[Numpy-discussion] Indexing empty dimensions with empty arrays

Travis Oliphant travis at continuum.io
Wed Dec 28 09:32:12 EST 2011


I agree with Dag, NumPy should provide consistent handling of empty arrays.   It does require some work, but it should be at least declared a bug when it doesn't.

Travis 

--
Travis Oliphant
(on a mobile)
512-826-7480


On Dec 28, 2011, at 7:45 AM, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:

> On 12/28/2011 02:21 PM, Ralf Gommers wrote:
>> 
>> 
>> On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no <mailto:d.s.seljebotn at astro.uio.no>> wrote:
>> 
>>    On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
>>> On 12/28/2011 09:33 AM, Ralf Gommers wrote:
>>>> 
>>>> 
>>>> 2011/12/27 Jordi Gutiérrez Hermoso<jordigh at octave.org
>>    <mailto:jordigh at octave.org>
>>>> <mailto:jordigh at octave.org <mailto:jordigh at octave.org>>>
>>>> 
>>>>     On 26 December 2011 14:56, Ralf
>>    Gommers<ralf.gommers at googlemail.com <mailto:ralf.gommers at googlemail.com>
>>>> <mailto:ralf.gommers at googlemail.com
>>    <mailto:ralf.gommers at googlemail.com>>>  wrote:
>>>>> 
>>>>> 
>>>>> On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd at gmail.com
>>    <mailto:josef.pktd at gmail.com>
>>>> <mailto:josef.pktd at gmail.com <mailto:josef.pktd at gmail.com>>> wrote:
>>>>>> I have a hard time thinking through empty 2-dim arrays, and
>>>>     don't know
>>>>>> what rules should apply.
>>>>>> However, in my code I might want to catch these cases rather
>>    early
>>>>>> than late and then having to work my way backwards to find
>>    out where
>>>>>> the content disappeared.
>>>>> 
>>>>> 
>>>>> Same here. Almost always, my empty arrays are either due to bugs
>>>>     or they
>>>>> signal that I do need to special-case something. Silent passing
>>>>     through of
>>>>> empty arrays to all numpy functions is not what I would want.
>>>> 
>>>>     I find it quite annoying to treat the empty set with special
>>>>     deference. "All of my great-grandkids live in Antarctica"
>>    should be
>>>>     true for me (I'm only 30 years old). If you decide that is
>>    not true
>>>>     for me, it leads to a bunch of other logical annoyances up
>>    there
>>>> 
>>>> 
>>>> Guess you don't mean true/false, because it's neither. But I
>>    understand
>>>> you want an empty array back instead of an error.
>>>> 
>>>> Currently the problem is that when you do get that empty array back,
>>>> you'll then use that for something else and it will probably still
>>>> crash. Many numpy functions do not check for empty input and
>>    will still
>>>> give exceptions. My impression is that you're better off
>>    handling these
>>>> where you create the empty array, rather than in some random
>>    place later
>>>> on. The alternative is to have consistent rules for empty
>>    arrays, and
>>>> handle them explicitly in all functions. Can be done, but is of
>>    course a
>>>> lot of work and has some overhead.
>>> 
>>> Are you saying that the existence of other bugs means that this bug
>>> shouldn't be fixed? I just fail to see the relevance of these
>>    other bugs
>>> to this discussion.
>> 
>> 
>> See below.
>> 
>>> For the record, I've encountered this bug many times myself and it's
>>> rather irritating, since it leads to more verbose code.
>>> 
>>> It is useful whenever you want to return data that is a subset of the
>>> input data (since the selected subset can usually be zero-sized
>>> sometimes -- remember, in computer science the only numbers are 0, 1,
>>> and "any number").
>>> 
>>> Here's one of the examples I've had. The Interpolative Decomposition
>>> decomposes a m-by-n matrix A of rank k as
>>> 
>>> A = B C
>>> 
>>> where B is an m-by-k matrix consisting of a subset of the columns
>>    of A,
>>> and C is a k-by-n matrix.
>>> 
>>> Now, if A is all zeros (which is often the case for me), then k
>>    is 0. I
>>> would still like to create the m-by-0 matrix B by doing
>>> 
>>> B = A[:, selected_columns]
>>> 
>>> But now I have to do this instead:
>>> 
>>> if len(selected_columns) == 0:
>>>      B = np.zeros((A.shape[0], 0), dtype=A.dtype)
>>> else:
>>>      B = A[:, selected_columns]
>>> 
>>> In this case, zero-sized B and C are of course perfectly valid and
>>> useful results:
>>> 
>>> In [2]: np.dot(np.ones((3,0)), np.ones((0, 5)))
>>> Out[2]:
>>> array([[ 0.,  0.,  0.,  0.,  0.],
>>>         [ 0.,  0.,  0.,  0.,  0.],
>>>         [ 0.,  0.,  0.,  0.,  0.]])
>>> 
>> 
>>    And to answer the obvious question: Yes, this is a real usecase. It is
>>    used for something similar to image compression, where sub-sections of
>>    the images may well be all-zero and have zero rank (full story at [1]).
>> 
>> Thanks for the example. I was a little surprised that dot works. Then I
>> read what wikipedia had to say about empty arrays. It mentions dot like
>> you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
>> 
>> In [1]: a = np.zeros((0,0))
>> 
>> In [2]: a
>> Out[2]: array([], shape=(0, 0), dtype=float64)
>> 
>> In [3]: np.linalg.det(a)
>> Parameter 4 to routine DGETRF was incorrect
>> <segfault>
> 
> :-)
> 
> Well, a segfault is most certainly a bug, so this must be fixed one way 
> or the other way anyway, and returning 1 seems at least as good a 
> solution as raising an exception. Both solutions require an extra if-test.
> 
>> 
>>    Reading the above thread I understand Ralf's reasoning better, but
>>    really, relying on NumPy's buggy behaviour to discover bugs in user code
>>    seems like the wrong approach. Tools should be dumb unless there are
>>    good reasons to make them smart. I'd be rather irritated about my hammer
>>    if it refused to drive in nails that it decided where in the wrong spot.
>> 
>> 
>> The point is not that we shouldn't fix it, but that it's a waste of time
>> to fix it in only one place. I remember fixing several functions to
>> explicitly check for empty arrays and then returning an empty array or
>> giving a sensible error.
>> 
>> So can you answer my question: do you think it's worth the time and
>> computational overhead to handle empty arrays in all functions?
> 
> I'd hope the computational overhead is negligible?
> 
> I do believe that handling this correctly everywhere is the right thing 
> to do and would improve overall code quality (as witnessed by the 
> segfault found above).
> 
> Of course, likely nobody is ready to actually perform all that work. So 
> the right thing to do seems to be to state that places where NumPy does 
> not handle zero-size arrays is a bug, but not do anything about it until 
> somebody actually submits a patch. That means, ending this email 
> discussion by verifying that this is indeed a bug on Trac, and then wait 
> and see if anybody bothers to submit a patch.
> 
> Dag Sverre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list