[Numpy-discussion] Indexing empty dimensions with empty arrays

Wed Dec 28 09:24:02 EST 2011

On Wed, Dec 28, 2011 at 2:45 PM, Dag Sverre Seljebotn <
d.s.seljebotn at astro.uio.no> wrote:

> On 12/28/2011 02:21 PM, Ralf Gommers wrote:
> >
> >
> > On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn
> > <d.s.seljebotn at astro.uio.no <mailto:d.s.seljebotn at astro.uio.no>> wrote:
> >
> >     On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
> >      > On 12/28/2011 09:33 AM, Ralf Gommers wrote:
> >      >>
> >      >>
> >      >> 2011/12/27 Jordi Gutiérrez Hermoso<jordigh at octave.org
> >     <mailto:jordigh at octave.org>
> >      >> <mailto:jordigh at octave.org <mailto:jordigh at octave.org>>>
> >      >>
> >      >>      On 26 December 2011 14:56, Ralf
> >     Gommers<ralf.gommers at googlemail.com <mailto:
> ralf.gommers at googlemail.com>
> >      >> <mailto:ralf.gommers at googlemail.com
> >     <mailto:ralf.gommers at googlemail.com>>>  wrote:
> >      >> >
> >      >> >
> >      >> >  On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd at gmail.com
> >     <mailto:josef.pktd at gmail.com>
> >      >> <mailto:josef.pktd at gmail.com <mailto:josef.pktd at gmail.com>>>
>  wrote:
> >      >> >>  I have a hard time thinking through empty 2-dim arrays, and
> >      >>      don't know
> >      >> >>  what rules should apply.
> >      >> >>  However, in my code I might want to catch these cases rather
> >     early
> >      >> >>  than late and then having to work my way backwards to find
> >     out where
> >      >> >>  the content disappeared.
> >      >> >
> >      >> >
> >      >> >  Same here. Almost always, my empty arrays are either due to
> bugs
> >      >>      or they
> >      >> >  signal that I do need to special-case something. Silent
> passing
> >      >>      through of
> >      >> >  empty arrays to all numpy functions is not what I would want.
> >      >>
> >      >>      I find it quite annoying to treat the empty set with special
> >      >>      deference. "All of my great-grandkids live in Antarctica"
> >     should be
> >      >>      true for me (I'm only 30 years old). If you decide that is
> >     not true
> >      >>      for me, it leads to a bunch of other logical annoyances up
> >     there
> >      >>
> >      >>
> >      >> Guess you don't mean true/false, because it's neither. But I
> >     understand
> >      >> you want an empty array back instead of an error.
> >      >>
> >      >> Currently the problem is that when you do get that empty array
> back,
> >      >> you'll then use that for something else and it will probably
> still
> >      >> crash. Many numpy functions do not check for empty input and
> >     will still
> >      >> give exceptions. My impression is that you're better off
> >     handling these
> >      >> where you create the empty array, rather than in some random
> >     place later
> >      >> on. The alternative is to have consistent rules for empty
> >     arrays, and
> >      >> handle them explicitly in all functions. Can be done, but is of
> >     course a
> >      >> lot of work and has some overhead.
> >      >
> >      > Are you saying that the existence of other bugs means that this
> bug
> >      > shouldn't be fixed? I just fail to see the relevance of these
> >     other bugs
> >      > to this discussion.
> >
> >
> > See below.
> >
> >      > For the record, I've encountered this bug many times myself and
> it's
> >      > rather irritating, since it leads to more verbose code.
> >      >
> >      > It is useful whenever you want to return data that is a subset of
> the
> >      > input data (since the selected subset can usually be zero-sized
> >      > sometimes -- remember, in computer science the only numbers are
> 0, 1,
> >      > and "any number").
> >      >
> >      > Here's one of the examples I've had. The Interpolative
> Decomposition
> >      > decomposes a m-by-n matrix A of rank k as
> >      >
> >      > A = B C
> >      >
> >      > where B is an m-by-k matrix consisting of a subset of the columns
> >     of A,
> >      > and C is a k-by-n matrix.
> >      >
> >      > Now, if A is all zeros (which is often the case for me), then k
> >     is 0. I
> >      > would still like to create the m-by-0 matrix B by doing
> >      >
> >      > B = A[:, selected_columns]
> >      >
> >      > But now I have to do this instead:
> >      >
> >      > if len(selected_columns) == 0:
> >      >       B = np.zeros((A.shape[0], 0), dtype=A.dtype)
> >      > else:
> >      >       B = A[:, selected_columns]
> >      >
> >      > In this case, zero-sized B and C are of course perfectly valid and
> >      > useful results:
> >      >
> >      > In [2]: np.dot(np.ones((3,0)), np.ones((0, 5)))
> >      > Out[2]:
> >      > array([[ 0.,  0.,  0.,  0.,  0.],
> >      >          [ 0.,  0.,  0.,  0.,  0.],
> >      >          [ 0.,  0.,  0.,  0.,  0.]])
> >      >
> >
> >     And to answer the obvious question: Yes, this is a real usecase. It
> is
> >     used for something similar to image compression, where sub-sections
> of
> >     the images may well be all-zero and have zero rank (full story at
> [1]).
> >
> > Thanks for the example. I was a little surprised that dot works. Then I
> > read what wikipedia had to say about empty arrays. It mentions dot like
> > you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
> >
> > In [1]: a = np.zeros((0,0))
> >
> > In [2]: a
> > Out[2]: array([], shape=(0, 0), dtype=float64)
> >
> > In [3]: np.linalg.det(a)
> > Parameter 4 to routine DGETRF was incorrect
> > <segfault>
>
> :-)
>
> Well, a segfault is most certainly a bug, so this must be fixed one way
> or the other way anyway, and returning 1 seems at least as good a
> solution as raising an exception. Both solutions require an extra if-test.
>
> >
> >     Reading the above thread I understand Ralf's reasoning better, but
> >     really, relying on NumPy's buggy behaviour to discover bugs in user
> code
> >     seems like the wrong approach. Tools should be dumb unless there are
> >     good reasons to make them smart. I'd be rather irritated about my
> hammer
> >     if it refused to drive in nails that it decided where in the wrong
> spot.
> >
> >
> > The point is not that we shouldn't fix it, but that it's a waste of time
> > to fix it in only one place. I remember fixing several functions to
> > explicitly check for empty arrays and then returning an empty array or
> > giving a sensible error.
> >
> > So can you answer my question: do you think it's worth the time and
> > computational overhead to handle empty arrays in all functions?
>
> I'd hope the computational overhead is negligible?
>

If you have to check all array_like inputs in all functions, I wouldn't
think so.

> I do believe that handling this correctly everywhere is the right thing
> to do and would improve overall code quality (as witnessed by the
> segfault found above).
>
> Of course, likely nobody is ready to actually perform all that work. So
> the right thing to do seems to be to state that places where NumPy does
> not handle zero-size arrays is a bug, but not do anything about it until
> somebody actually submits a patch. That means, ending this email
> discussion by verifying that this is indeed a bug on Trac, and then wait
> and see if anybody bothers to submit a patch.
>

Agreed. I've created http://projects.scipy.org/numpy/ticket/2007

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111228/684bd04c/attachment.html>